r 
4 
ATA LIBRARY 
eo 11010 - 142 Street NW 
Edmonton, AB 


LEARNING TSN 2R] 


TRENDS IN HIGH 
SCHOOL 
ACHIEVEMENT 


March 7, 2001 


Final 


This Document Contains Advice to 
the Minister of Learning 


Trends in High School Achievement 


Prepared by System Improvement and Reporting Division. Alberta Learning 
March 1S, 2001. 


For more information, contact Dennis Belyk, Senior Manager (422-3226). 


TABLE OF CONTENTS 


PXCCUUIVE SS UMMA IY ae cacnas Sothevoasateatianeds hedn tan date e dw ecesaatuinaee dca anevada tue maea hens 1 
INEFOAUCHION sxseseeatetinesaltetee Reed cal ae etait sden intra tena ronretaen | 
PUD DOSG: © rei she cr cadeateastiueteaeraad ete te edie tity te auth goede Ses tea cial eas wemuuaura er mrenuanredss l 
Alberta Learning Business Plan Lik seiceso3ccese chicos teccatacssosecaied cates devel svsacas aeuseanteccemecees I 
PROJECE MANA SSIS 325.225 gece civencandcinerdie ni ceeatle Sos sas chasseduaseis as dasiwodaatovtntanemanaisenete | 
Pr NDOTLA OMEN Sec vs vias eodeanaiestnay. vga teshtieiceeitaes ia a eae eae Sela ue ea eacs dopant Tae 2 
Diploma-Ex amination: Progra scdscccdesscascis cuvacceswviseuansuapseeuivcsdentasseevnsianectoseaievesiteveedieaves Z 
National-and: Inbernational Studies cos :ccecscestsdacsercuntacayvgus stove caeacdscuesanutleudvausteniad asia ooartus 2 
Introduction of a New ‘Anchor Test’ Achievement-Over-Time Design... .eeeeeseeees é) 
Changes Demosraphics/EX Pec tations acceandsssasSincattadetsdlae acest eanaaawaategeenvae nah ae 3 
PPE CE POSE AIS: 13 sa testicis cee eiesd tush nlsesy as cveid sage re Seoeediolcaceancadea bined vangsiGagnedusuasbachabnadeeenesomssesnont® 3 
Opportunities/C on straints/ LlmitqtiOns 2c ceccdavecnaoe ccaredsay soaks Suchearestneaseariancsenerer uaales 5 
Components.of the Overall Stud yisasasiannawusiataccmandisdeu amis adnsiayta tienen 5 
Findings. in Relation:to. the Research Questions ss sccesseccsesdaceiciecoceseetdtanabicesstereeenaatsaaneseennoe 6 
EinkinigsotResul ts/Conclusions: 23 osha vides aie eee eae aie ietucadsetezes 10 
Summary Table of Project Findintas cistiscsscesstcusoehissbagararcasoaesedesdoveoedwacastauven anadseaateasevessse 1] 
RECOMMENDS a5casaveecc diets vvesnccs tanta hoitaiudenas Veen eden palte unaateniaihcishcedbeade aston I] 
Appendices: 
Appendix 1; Achievement of Alberta Students occ... ccccccees cocsdecce casdecctessetsatsevardendsasvsay snes. 13 
Appendix 2-1: Special Study (Mathematics 30 Written-Reponse)..........c.ccesssceeeeees 37 
Appendix 2-2: Comparison of Students’ Writing (English 30)... cceeeesseseseeeneensees me) 
Appendix 2-3: Conventions of Language Study (English 30)... cece eeeeeeseeeeeseseseees 69 
Appendix 3: Cognitive Analysis of Specific Examination 0... ce eeeeseeseeeeessesscesereees 95 
Appendix 4: Probe into the Perceptions of High School Teachers and... eee 111 


Post-Secondary Instructors Regarding the ‘Preparedness’ of 
Students for Post-Secondary Studies 


Executive Summary 


This study emerged from a desire to better understand the nature of changing expectations and 
standards in relation to how well high school students are being prepared for post-secondary 
studies. The underlying concern that standards and expectations have been eroding over the 
years was central to the study. 


The provincial diploma examinations, previously conducted grade 12 achievement-over-time 
studies and the perceptions of high school teachers and post-secondary instructors were 
identified as key features for the study. Where possible, data from as early as 1985 was drawn 
upon to explore expectations and standards. For example, a Cognitive Demand Analysis' of 
selected diploma examinations from 1985, 1990 and 2000 was conducted to gauge changes. 


Contractors were engaged to undertake key components of the study. The chart below shows a 
summary of the findings that are more fully described in this report and the appendices. 


Summary Table of Project Findings 

[| Subject || Result 2000%* 
Chemistry 30 No data available 

English 30 Little difference (1989 to 2000) 
Mathematics 30 No data available 
English 30 WR Slight increase (1991 to 2000) 
Eng. 30 Conventions of Lang. Slight decrease (1993 to 2000) 
Mathematics 30 Curriculum/exam changes 
Chemistry 30 Slight increase (1985 to 2000) 
English 30 Most increase from 1985 to 1990 
Mathematics 30 Same 1985 to 2000 
Preliminary findings: 
Less than 50% say prepared 
+ | More than 50% say prepared 


Study/Report 
Existing AOT 
Reports* 


Additional AOT 
2000 Studies 


Cognitive 
Demand 
Analysis 
Post-Secondary 
Instructor/High | Post-Secondary Instructors 
School Teacher Teachers (Grade 12) 
Perceptions 
Probe 
*AOT — Achievement Over Time 

** = ‘no change’; + ‘increase’; - ‘decrease’; ? ‘couldn’t determine’ 


The major finding in this project is that there has not been a decline in standards and 
expectations over the last 15 years. In the area of English 30, use of conventions of language, 
study participants noted that there were slightly more errors in 2000. However, in 2000, students 
were asked to write in more sophisticated ways, take risks and grapple with increasingly 
complex ideas. In all other areas, standards and expectations have been maintained or there has 
been a slight increase. 


" Cognitive Demand Analysis: A process that gauged the cognitive requirements of each diploma examination noted using criteria 
developed through the Schoo! Achievement Indicators Program for Mathematics, Science, Reading and Writing. See Appendix C for 
details. 


Introduction 


Purpose 


The Trends in High School Achievement/Post-Secondary ‘Preparedness’ Study was designed to 
address questions related to how student achievement and standards in selected grade 12 subject 
areas have changed, if at all, over the last 15 years. More fundamentally, the project needed to 
address whether there has been a decline in standards and expectations over the last 15 years. 
Also, the study was designed to probe the perceptions of grade !2 teachers and post-secondary 
instructors regarding how well students are being prepared for further post-secondary study. 
Chemistry 30, English 30 and Mathematics 30 were selected as the primary focus for the study. 
Specifically, the following questions guided the design of the study: 


e Has student achievement in English 30 and Mathematics 30 changed over the last 10 
years? 

e Have cognitive demands in Grade 12 examinations in Chemistry 30, English 30 and 
Mathematics 30 changed over the last 15 years? 

e What are the perceptions of grade 12 teachers and post-secondary instructors relative to 
the level of ‘preparedness’ of grade {2 students entering post secondary? 


Alberta Learning Business Plan Links 


This study links to two goals in the 2000-2003 Business Plan: 


Goal 2: Excellence in Learner Achievement 

Outcome: Learners demonstrate high standards across a full range of areas 

Goal 5: Highly Responsive and Responsible Ministry 

Outcome: The Ministry demonstrates leadership and continuous improvement in 


administrative and business processes and practices 
Project Management 


This project was managed by System Improvement Group, System Improvement and Reporting, 
Alberta Learning, with strong support from the Learner Assessment Branch (LAB), Alberta 
Learning. LAB expertise and support in conducting qualitative achievement-over-time studies in 
July 2000 was critical to this project. The other components of this project were achieved 
through the use of contractors. 


A small advisory committee was established to provide guidance. The committee met twice, 
once in May to consider the project plan and again in October to advise on overall project 
recommendations. 


Alberta Context 


Diploma Examination Program 


The Grade 12 Diploma Examinations Program, introduced in the early 1980s, is designed to 
develop and maintain excellence in educational standards by certifying academic achievement. 
New diploma examinations are developed annually for each January and June administration. 
Student achievement results are reported publicly following each administration. 


It is particularly helpful to the department to know if there have been changes in the levels of 
student achievement over time and the extent to which expectations and standards have remained 
the same or changed. Confounding this challenge are changes to curriculum, improvements in 
the design of diploma examinations and a changing student population. Even with these 
challenges, LAB has demonstrated an ongoing commitment to tracking achievement over time 
by implementing a number of different studies. These studies have been formally reported in 
annual reports or special study reports prepared by LAB. 


A significant opportunity presented through the Trends in High School Achievement/ Post- 
Secondary ‘Preparedness’ study was to: 


e build upon the previous achievement-over-time study successes; for example, a qualitative 
Mathematics 30 written-response Achievement-Over-Time study was undertaken modeled 
on the English 30 qualitative study design; 

e repeat previously successful qualitative studies in English which would link results in 2000 
to the early 1990s; and, 

© consider new ways of approaching the question of standards: for example, conducting a 
‘cognitive demand’ analysis of select diploma examinations. 


National and International Studies 


The achievement of students in Alberta is also informed by results from national and 
international studies in which Alberta participates. The School Achievement Indicators Program 
(SAIP) and the Third International Mathematics and Science Study (TIMSS) are the most recent 
studies in which Alberta has participated. These studies provide an indication as to how well 
students from across Canada and others countries achieve. They provide valuable information in 
understanding Alberta standards in a broader context. The contractor that reviewed the 
Achievement-Over-Time reports undertook a review of these results. 


Introduction of a New ‘Anchor Test’ Achievement-Over-Time Design 
Over the last 10 years, the LAB has made efforts to determine whether or not there has been any 


change in achievement over time. Anchor tests have been administered to selected groups of 
students who were also writing the diploma examinations. These tests have been used to equate 


different June examinations. Although this process has given some useful information, it has not 
been entirely satisfactory. Since the anchor tests have been given as field tests, student 
motivation is not the same as for examinations. Also, the testing has interfered with the field- 
testing process, and the anchor tests are necessarily shorter than the examinations, reducing their 
reliability. Curricular changes have also resulted in modifications to the anchor tests, 
complicating their use in equating. 


In February 2000, Executive Team, Alberta Learning approved a new methodology to determine 
any changes in student performance on diploma examinations. It includes developing and 
administering ‘anchor’ diploma examinations in the January and June administrations. This 
design will support identifying variation in difficulty of diploma examinations by comparing 
changes in student performance on the anchor examinations and diploma examinations. The 
development and implementation of this design is planned from 2001 to 2002. 


There are several key requirements specified for this plan: 


¢ Anchor tests for respective subjects will need to last a minimum of five years; courses 
currently scheduled for a change in curriculum will wait until that curriculum change has 
occurred. 

e Anchor tests will meet the same standards as diploma examinations. 

@ Major curriculum change will result in anchor test redevelopment, minor curriculum changes 
may only require slight modifications. 

e Security of anchor tests will be high. 


Two issues are specifically cited that will need to be addressed to ensure validity of the results. 
First, randomly selected samples will be required for the ‘anchor examinations’ to ensure results 
can be generalized to the entire ‘writing’ population. Second, the importance of student 
Motivation in ensuring valid results is noted along with the need to consider alternatives that will 
address this issue. 


Changing Demographics/Expectations 


In recent years, more Alberta students have been completing high school and meeting basic 
requirements for further post-secondary study. This is largely due to changing expectations for 
the need for education to secure employment or to engage in post-secondary studies. Not only 
do students stay in school longer, some students return to high school after a brief absence to 
complete high school studies. Statistics Canada data shows that for Alberta in 1961, for youth 
aged 15 and over, 37.1% had achieved an education level of less than grade 9 (for Canada the 
figure was 44.1%); that figure improves for Alberta in 1991 with a drop to 9.1% (for Canada the 
figure drops to 14.3%)*. The per cent of Alberta’s population aged 15 and over with a university 
degree in 1961 was 3.0% (Canada 2.9%); in 1991, the figure jumps to 11.9% in Alberta (Canada 
11.4%). 


Historical enrolment patterns in Grade 1-12 schools in Alberta also reflects the ‘stay-in-school- 
longer’ phenomenon. Some additional information related to enrolment is shown below. 


* Taken from Statistics Canada: Education Attainment and School Attendance, 1993, pages 11 and 20). 


Select Enrolment Information Taken from Annual Education Reports 


| Grade/Year_ | 1930 || 1960 | 1990 
27,307 
|GradeQ | 917919161 | 35,566 


48,434 


Grade 11 4200 13,244 33,995 46,013 
Grade 12 1596 11,291 41,521 49,285** 


109% 


% (Gr.12 to Gr.9) 116% 
% (Gr.12 to Gr.1) 


* The 2000 Grade | population is lower than 10 years ago. A possible explanation is that the ‘baby boom echo 
bulge’ is slowly moving through the grade levels with population declines being noticed at the early grade levels. 
** This figure represents grade 12 students who are younger than 20 (maintains greater consistency with earlier 
Grade |2 data). 


High School graduation rates have also increased over the years. In recent years, about 60% of 
students receive their high school diploma within 4 years of entering grade 9 and upward of 70% 
within 6 years of entering grade 9. Many of these students complete the high school diploma 
requirements with the intent of pursuing further post-secondary studies. With more students 
pursuing post-secondary studies, receiving institutions should expect a broader range of skills 
and knowledge for entry-level students. 


Diplomas Awarded 


1930 =| = 1960S | S990 |S 2000 
Dipl. Awarded (#’s) 5934 21,898 29,119* 


Yo of Gr. 12 Pop. 
% of Gr. 1 Pop. 


* Certificate of Achievement and General Equivaléncy Diploma Awards are not included to maintain consistency 
with earlier calculations. If included. the result would be about 63% rather than 59% and 72% rather than 68%. 


Project Details 


Opportunities/Constraints/Limitations 


This study was initiated in the spring of 2000 and targeted for completion in December 2000. 
There were both opportunities and challenges that emerged as a result of this time frame. Key 
opportunities that were presented: 


© Grade |2 teachers expert in their respective subject areas are selected to mark diploma 
examinations each July and could be available to support additional achievement-over-time 
studies; 

e the period immediately following the diploma examination marking sessions is historically 
when qualitative achievement-over-time studies have been conducted and within the time 
frame for this project; 

® arecord of achievement-over-time has been reported annually and the reports could easily be 
accessed for review and analysis; 

e the methodology for conducting qualitative achievement-over-time studies was already 
established in LAB and could be applied to new areas; and, 

e in August 2000, LAB presented a new achievement-over-time model for the diploma 
examination program which could be related to the findings from this project. 


The expected December completion date also presented challenges: 


e there was inadequate time to prepare robust new research projects with appropriate piloting, 
validation and sampling processes, as a result, there was a reliance on established, proven 
methodologies; 

© most post-secondary instructors are not available from June to August, thus convenience 
samples were the ‘best’ that could be achieved at the post-secondary level; and, 

e the requirements of this project exceeded the avaiiable internal Alberta Learning human 
resources, thus, contractor support was required. 


Given the tight time frame, it was necessary to consider areas that might be informed by 
initiating a ‘pilot’ or ‘probe’ approach. The perceptions of grade 12 teachers and post-secondary 
instructors was suited to this approach given the challenges faced in achieving proper samples 
and the need to design and validate new survey instruments. 


Components of the Overall Study 


The overall approach in responding to the research questions was to primarily build upon 
existing achievement-over-time information, to extend the understanding of expectations and 
standards at the grade 12 level and to design a probe into the perceptions of high school teachers 
and post-secondary instructors into the ‘preparedness’ of students for further post-secondary 
studies. Four components emerged: 


1) Analysis of grade 12 student achievement-over-time information contained in LAB reports; 
and, related national and international achievement evidence. 

2) Additional qualitative student written-response achievement-over-time studies in E30 and 
M30. 

3) Cognitive demand analysis of the 2000, 1990 and 1985 June diploma examinations in 
Chemistry 30, English 30 and Mathematics 30. 

4) A probe into the perceptions of high school teachers and post-secondary instructors 
regarding the ‘preparedness’ of students entering post-secondary studies with a focus on: 
chemistry, English and mathematics. 

As a result of the first Advisory Committee meeting, the achievement of Alberta students in 

national and international studies was added to the achievement-over-time review to provide a 

broader context for understanding how well Alberta students are achieving. 


Findings in Relation to the Research Questions 


The components of this project were designed to inform the original research questions. Existing 
information and findings from the contractor reports are presented below. 


Has student achievement in English 30 and Mathematics 30 changed over the last 10 years? 
(Comments related to Chemistry 30 are included) 


Analysis of grade 12 student achievement-over-time information contained in LAB reports 
(Appendix |) 


The report notes that there is little evidence in the achievement-over-time reports that significant 
differences in achievement have occurred over the years. For all subject areas included in the 
achievement-over-time reports, including Chemistry 30 as well as English 30 and Mathematics 
30, the achievement-over-time evidence that is presented indicates that achievement has 
remained about the same or increased slightly. 


Also, the report highlights the challenges presented in designing valid reliable achievement- 
over-time studies. The methodologies were reviewed and the report notes a few important 
factors that impact these studies: changing curriculum, :ntermittent use of achievement-over- 
time studies and student motivation. A comprehensive review of the achievement-over-time 
study methodologies was also limited due to the absence of supporting published technical 
reports. 


-6- 


Related national and international achievement evidence 


Student achievement evidence drawn from the national SAIP and the international TIMSS 
reports shows that Alberta students achieve well on the national and international stage. 


SAIP Science (1996) indicated Alberta students achieved above that of Canada, overall. SATP 
Science (1999) results were released in June 2000 (SAIP Science 1999, Council of Ministers of 
Education, Canada). Results from this study show that Alberta students continue to out-perform 
students from across Canada. Also, the province’s students in 1999 significantly improved on 
results from the 1996 assessment surpassing expectations. 


SAIP Mathematics (1997) showed the same type of result as science but not as high. The report 
also showed an increase in performance from 1993 to 1997 for Alberta students in problem 
solving. 


SAIP Reading and Writing (1998) indicated that for reading, Alberta students perform about as 
well as Canadian students overall, and for writing, Alberta student performance is higher than 
for Canadian students overall. Comparing these results to those reported for SAIP Reading and 
Writing (1994), reading performance was the same or slightly lower in 1998 (levels 4 and 5), 
however, the results for writing were higher in 1998 (levels 4 and 5) compared to 1994. 


TIMSS and SAIP results show that Alberta students perform very well. The Government of 
Alberta News Releases capture the overall results: 


(June 11, 1997) ... “Alberta’s grade 4 students outranked all other English speaking 
participants... Alberta students achieved third in science and seventh in mathematics, 
compared to students in 26 countries and provinces taking part in the study.” 

(February 24, 1998: the results for grade 12 students were reported as) “... Alberta 
students achieved the third highest score in science literacy and the fifth highest score in 
mathematics literacy, compared to students in the 24 countries and provinces taking part 
in the study.” 

(June 5, 2000) ... “Alberta’s 13 and 16-year-olds continue to out-perform students 
across Canada... ... 1999 SAIP science results show that 13-year-old Alberta students 
showed significant improvement between 1996 and 1999 at level 3 while Alberta 16- 
year-old students showed significant improvement in 1999 at both levels 3 and 4.’ 


Additional student written-response achievement-over-time studies in M30 and E30 (Appendices 
2-i, 2-2, 2-3) 


Mathematics 30: In general, the Mathematics 30 Written-Response Special Study highlights the 
factors that prevent a comparison of student achievement in 1991 to 2000. A new curriculum, 
new test design and a changed method for scoring written-response questions, all introduced 
between 1991 and 2000, prevents any direct comparisons of student achievement. The report 
does describe how Mathematics 30 examination standards and scoring procedures have changed 
and signals that expectations, while changed, have remained high and appropriate. 


English 30: Two additional studies were repeated in July 2000. Results from the English 30 
Written-Response Achievement-Over-Time Study (E30 WR AOT) and the English 30 
Conventions of Language Study (E30 COL) are described below. 


The E30WR AOT study: Qualitative analysis of the assignments and the scoring criteria indicates 
that scoring criteria in 2000 are more demanding than in 1991. When 199] papers were rescored 
using the 2000 scoring criteria there was little difference in the scores given at the acceptable 
level. However, papers receiving a score of 5 (excellence) in 1991 were typically scored lower 
(‘4’) when using the 2000 scoring criteria. Given that similar proportions of students are 
achieving the acceptable level and level of excellence in 2000 compared to 1991, this study 
showed slight improvement in writing skills in 2000 compared to 1991. 


The E30 COL study: The study in 2000 repeated an earlier study that was conducted in 1993 that 
focused on the type of errors found in student papers and the number of errors. In 1993 an 
average of 28 errors per paper were found, in 2000, 30.8 errors per paper. It was noted that these 
average error counts were not unexpected given that these papers are first draft writing with the 
2000 papers being longer. The analyses considered five scoring categories: 


Punctuation: More errors in 2000 than in 1993 
Sentence Structure/Construction: Same number of errors in 2000 as in 1993 
Usage: Fewer errors in 2000 than in 1993 

Verbs and Pronouns: Mixed/about the same in 2000 as in 1993 
Spelling: Slight increase in errors in 2000 over 1993 


Overall, this study found that there are slightly more errors in student papers in 2000 than in 
1993. The study participants noted that, in 2000, students were asked to write in more 
sophisticated ways, take risks and grapple with increasingly complex ideas. The report further 


states ‘The goals of clear, complex thinking and clear, cogent expression demand a universal 
commitment’. 


Have cognitive demands in Grade 12 examinations in Chemistry 30, English 30 and 
Mathematics 30 changed over the last 15 years? (Appendix 3) 


The analyses of the 2000, 1990 and 1985 June diploma examinations in Chemistry 30, English 
30 and Mathematics 30 was conducted applying SAIP criteria. The results of this analysis 
provided the basis for the following reported conclusions: 


Chemistry 30: The cognitive demand of the diploma examinations has increased slightly over 
the fifteen-year period. This was reported for the multiple-choice, written-response and 
numerical-response sections of the examinations. 


Mathematics 30: The cognitive demand of the June 2000 diploma examination is considerably 
higher than that of the June 1990 diploma examination and about the same as that of the June 
1985 diploma examination. This applied to both the multiple-choice and written-response 
sections of the examinations. The numerical response section, only appearing on the 1990 and 
2000 examinations, was found to be at about the same cognitive demand level. Over the last 15 


years there appear to be changing levels of expectation without significant increases in cognitive 
demand. ic 


English 30: For the written-response section the cognitive demand of the June 2000 diploma 
examination is higher than that of the June 1990 and June 1985 diploma examinations. This 
applied to both the text and task parts of the written response. For the multiple-choice 
component the cognitive demand of the June 2000 diploma examination is higher than that of 
the June 1985 and similar to the June 1990 diploma examination. In June 2000, the texts were 
considerably longer and ‘more dense’ than the texts in 1990. In 1990, the texts were shorter but 
at least as high in overall cognitive demand as the texts in 2000. Since 1985, there has been an 
increase in the cognitive demand of the English 30 diploma examination with most of the 
change occurring between 1985 and 1990. 


What are the perceptions of grade 12 teachers and post-secondary instructors relative to the 
level of ‘preparedness’ of grade 12 students entering post secondary? 
(Appendix 4) 


This probe was conducted within a tight time frame that created significant sampling constraints. 
For the post-secondary instructor perceptions, a convenience sample was used; for the high 
school teacher perceptions, the grade 12 teachers selected to mark diploma examinations were 
used. While the grade 12 teacher group was certainly a representative group, the full range of 
teachers in Alberta was not represented. As a result of the sampling limitations for both 
populations the findings cannot be generalized but serve to inform the feasibility and desirability 
of further investigation on this topic. 


The findings provide some preliminary indication that the perceptions of grade 12 teachers and 
post-secondary instructors differ with respect to how well they believe grade 12 students are 
prepared for further post-secondary study. Generally, the grade 12 teacher responses indicated a 
belief that students are being well prepared for further post-secondary study while the post- 
secondary instructors’ responses indicated a belief that grade 12 students aren’t as well prepared 
as desirable. This initial probe did not attempt to uncover reasons ‘why’ particular perceptions 
existed. The contractor that undertook this probe recommended that: 


“...a larger, more comprehensive study be undertaken to see whether the results apply 
when more generalizable samples are used.” 


The report (Appendix 4) notes the study limitations which supports limiting the use of the results 
to informing the feasibility and potential need for further study. 

This is not the first time that perception information has been collected. The Alberta Learning, 
1992 Achieving the Vision report included results from a survey of perceptions of post-secondary 
instructors of the skills of high school graduates in science related areas. While the survey 
questions were science directed, only post-secondary instructors were surveyed and a different 
response scale was used, it is interesting to note that the results have a similar pattern to the 
results of this probe. Less than half of the post- secondary instructors surveyed in 1992 
responded that they felt high school students were prepared at the ‘excellent’ or ‘good’ level; that 
reported result patterns the post-secondary instructor responses in this study. 


Linking of Results/Conclusions 


Overall, comparing expectations and standards from June 1985, 1990 and 2000, this study found 
that expectations and standards in 2000 are the same or slightly higher than they were 15 years 
ago. Curriculum revisions and new diploma examination designs have served to ensure that high 
standards and expectations are maintained. 


The cognitive demand analysis supports this conclusion. Evidence from that analysis indicates 
that cognitive demands have been maintained or increased in the June 2000 Chemistry 30, 
English 30 and Mathematics 30 diploma examinations compared to respective diploma 
examinations in June 1990 and 1985. 


Student performance, as reported in the Achievement-Over-Time reports, shows that students are 
continuing to perform at about the same or slightly higher levels as in previous years, however, 
continuity of the achievement-over-time evidence has been limited due to variance in the 
subjects areas studied and different approaches to studying achievement-over-time. 


The qualitative studies in English 30 and Mathematics 30 provide the most direct information in 
understanding the nature of the changes in standards and expectations. Evidence from these 
studies confirm and reinforce that high standards are desired and in place. Together with student 
performance information from the Achievement-Over-Time studies and the cognitive demand 
analysis, the collective evidence supports that standards have been maintained or increased 
slightly over the years. 


The probe into the perceptions of post-secondary instructors and high school teachers regarding 
how well students are being prepared for further post-secondary study provides some preliminary 
indication that there may be a difference in the perceptions between the two groups. However, 
the sampling constraints created by the tight time frame prevent generalizing the perception 
findings. It is feasible that a robust study could be conducted if desired, however, a special 
project would be required that would ensure adequate time for survey instrument development 
and validation, and a sampling design that would support generalizing the results. The value of 
such a study would have to be considered given that similar response patterns for post-secondary 
instructors was reported in the 1992 Alberta Learning annual report. 


Key information to uncover if a full perception study were undertaken would be to determine 
whether post-secondary instructor perceptions were based on the quality of the diploma 
examinations in reflecting the curriculum standards or the appropriateness of the curriculum 
standards embodied in the diploma examinations. (Note: The appropriateness of the curriculum 
standards has not been reviewed as part of this project but is mentioned here as the diploma 
examinations are designed to directly reflect the standards and expectations of the respective 
curricula). 


-10- 


Summary Table of Project Findings 
| Study/Report [Subject || Resultin 2000** 
Existing AOT Chemistry 30 ~ No data available 
Reports* English 30 Little difference (1989 to 2000) 
Mathematics 30 No data available 
Additional AOT | English 30 WR Slight increase (1991 to 2000) 
2000 Studies Eng. 30 Conventions of Lang. Slight decrease (1993 to 2000) 
Mathematics 30 ? Curriculum/exam changes 
Cognitive Chemistry 30 Slight increase (1985 to 2000) 
Demand English 30 Most increase from 1985 to 1990 
Analysis Mathematics 30 Same 1985 to 2000 
Post-Secondary Preliminary findings: 
Instructor/High | Post-Secondary Instructors Less than 50% say prepared 
School Teacher Teachers (Grade 12) More than 50% say prepared 
Perceptions 
Probe 


*AOT — Achievement Over Time 
** = ‘no change’; + ‘increase’; - ‘decrease’; ? ‘couldn’t determine’ 


Recommendations 


The following recommendations are presented to support and/or enhance current practices in the 
collection and reporting of information related to the standards and expectations for students 
leaving the secondary system. 


Recommendations for the Curriculum Standards Branch, Basic, Alberta Learning: 


1. That the relationship between grade 12 standards and respective standards in selected first 
year post-secondary courses be formally reviewed and analyzed when a major high school 
curriculum change in a core area is introduced. 


Discussion: This recommendation is included to provide a more substantive understanding 
of differences in standards between secondary and post-secondary programs. Without this 
understanding an over reliance on perception data may occur. The articulation and 
seamlessness between secondary and post-secondary courses emerges as an issue once ina 
while. Discussions around the issues that emerge are partly informed by perceptions and 
this strategy would help to bridge from reliance on perception information to documented 
analysis of related courses. 


-f1- 


Recommendations for the Learner Assessment Branch, SIR, Alberta Learning: 


2: 


That an improved design for AOT be developed at the Grade 12 level that provides for 
design stability for at least 5 years, parallels current diploma examinations, and supports and 
facilitates reporting on student performance over time. 


Discussion: The current strategies for tracking changes in achievement over time in the 
diploma examination program have not been successful in providing consistent or definitive 
subject specific information. A new Strategy is needed. 


That AOT Technical Reports be prepared for publication that support external critical review 
of methods and procedures. 


Discussion: Currently, the technical information/reports that have been created have not 
been prepared for publication. The importance of the Diploma Examination Program would 
benefit from the support of published Technical Reports that document the technical aspects 
of the program supporting ‘open’ and ‘transparent’ processes. 


That a cognitive demand analysis of select diploma examinations is conducted every 5 years 
(e.g. 2000, 2005, 2010...) using external criteria such as that developed through the School 
Achievement Indicators Project (SAIP). A 5-year cycle would apply when curriculum 
remains stable. With the introduction of a new curriculum or new diploma examination 
design, a diploma examination cognitive demand analysis is conducted on the new diploma 
examination in that subject to provide benchmark information. Subsequent cognitive analysis 
in that subject then is aligned with the 5-year cycle. 


Discussion: Analyzing the cognitive demand of diploma examinations provides an additional 
insight into the nature of standards and expectations and how these change over time. Use of 
the nationally developed and accepted SAIP criteria would provide an external perspective 
on the standards associated with Alberta diploma examinations. This strategy would provide 
valuable information in noting changes in expectations over time resulting from significant 
or incremental changes to curriculum and/or diploma examination design. 


That the Qualitative AOT study model be extended to focus on specific priority areas (for 
example, problem solving in mathematics, or changes in scoring standards and criteria in 
subjects with significant curricular change). This strategy would gauge changes in 
expectations and benchmark new standards. 


Discussion: There is great value in in-depth understanding of standards and expectations in 
priority areas. The intermittent need for specific information would be supported through 
this strategy. 


That the LAB continues to engage the use of external experts in advising on AOT 
designs/methodologies to maintain current best practices. 

Discussion: Valuable external expert advice, knowledge of emerging/best practices and 
maintaining high credibility are key benefits. 


Appendices 


APPENDIX 1 


ACHIEVEMENT OF ALBERTA STUDENTS 


EVIDENCE FROM THE DIPLOMA EXAMINATIONS 1989 - 2000 
AND FROM SAIP AND TIMSS 1996-2000 


September 24, 2000 


Summary 


At the request of Alberta Learning, fourteen reports of investigations into changes in 
performance on the Alberta Diploma Examinations were reviewed. In addition, portions of 
reports relating to Alberta students’ performances on three SAIP and one TIMSS assessment 
were examined to see if trends in achievement could be drawn from the studies. 


While there is little evidence of notable gains or losses in the average level of student 
achievement on the Diploma Examinations since 1989, there is evidence to suggest that Alberta 
students perform at a higher level than students in most other Canadian provinces in 
mathematics, science and writing. In addition, Alberta students perform as well as students from 
many of the industrialized countries that participate in international comparative assessments in 
science and mathematics. 


In this report, the general problem of evaluating achievement over time is discussed. The 
approach to studying achievement over time in the context of the Alberta Diploma Examinations 
is analyzed and suggestions for improvements are made in terms of ten recommendations. 


Recommendations: 

1. The Learner Assessment Branch of Alberta Learning should, as a matter of course, create a 
technical manual that describes exactly the processes that are used for all of their 
psychometric activities. 


2. The anchor test approach to studying achievement over time is an excellent procedure to use, 
but in the environment of the Diploma Examination program, it is difficult to do well. If the 
existing constraints are to be maintained, special studies must be designed so that student 
motivation on the anchor test approximates motivation on the corresponding Diploma 
Examination. 


3. The anchor test design should restrict the use of a specific anchor test to three or four years. 


4. Changes in performance should be reported for upper and lower students (e.g. 10" and 90" 
percentile) as well as equated means. 


5. Differences in achievement over time should be expressed in standardized units (effect sizes) 
and not as significance levels. 


6. The credibility of assumptions required by the anchor test design should be examined 
regularly. 


7. If long range broad based achievement over time studies are desired [for example studies of 
Physics 30 over 10 years] special studies should be designed and implemented. See for 
example Clarke, Nyberg and Worth (1977). 

8. Specific aspects of achievement over time (e.g. specific reporting skills, problems solving, 


and laboratory investigations) should be studied by carefully designed studies. The 


-14- 


qualitative studies described in the body of the report are good examples, but quantitatively 
oriented studies could also be undertaken using specifically focused tests administered to 
representative samples of students spanning an interesting time interval. 


9. Factors that are known to influence achievement over time should be investigated as part of 
the AOT design so that explanations for differences can be proposed. [The Alberta Learning 
website could offer opportunity for commentary and debate]. 


10. Alberta Learning should only participate in national and international projects that are 
designed to provide explanations of the results. [TIMSS is a good example. ] 


I. Achievement over Time and the Diploma Examinations 
A.A Brief Introduction to Achievement and Achievement Over Time 


Student achievement as a concept is at once simple and complex. As students move from one 
grade to the next it is clear that they are able to do things that they couldn’t do previously. They 
know more than they used to, they can perform more complex tasks, they can express themselves 
in a more sophisticated fashion, they can solve more difficult problems, and they can reason 
more efficiently. Metaphorically, it is as though they have moved along the achievement 
highway. But this portrayal of achievement as some sort of continuum along which we can place 
students hides some very diverse, complex, and poorly described attributes. For example, to talk 
about achievement in History involves notions of memory, proficiency in organizing ideas, 
insight into the nature of social forces, ability to analyze arguments, etc. To say that one person 
has a higher achievement in history than another requires that we disregard the multifaceted 
nature of the underlying cognitive processes, that we see the substance of history as 
pedagogically sequential, and that we confine ourselves to a specific corpus of material. 


For the past 40 years it has been useful to speak of achievement in terms of particular 
educational goals. Generally speaking these goals are packaged into curricula and divided into 
courses. Under this umbrella, achievement refers to how closely the states of students’ minds 
approximate the statements of goals that are to be found in program outlines, course guides and 
curricular objectives. The operationalism surrounding the social and behavioral sciences of the 
1950°s and 60’s provided strong reinforcement to this view, and in addition promoted an 
approach to assessment of achievement as measurement of achievement. Measurement of 
achievement implies the assignment of numbers to represent positions of performance along an 
achievement continuum. Thus assessment becomes linked to a particular underlying 
psychometric model. An unfortunate consequence is that progress to those aspects of educational 
goals that can be measured is given prominence. Another consequence is that the measure 
becomes the achievement and although the measure is dependent upon a particular procedure for 
mapping content into tasks that can be scored, the specificity of the process is forgotten when we 
come to make claims about the level of achievement at a particular time. 


Achievement over time (AOT) is the label that is used to describe studies that aim to detect 
changes in average test performance over time within a jurisdiction. The main purpose of AOT 


-15- 


studies is to monitor system performance. To accomplish the task there are several difficulties 
that must be overcome. First, the tests used before and after the time interval must have the same 
metric. That is the scales must have the same ‘zero point’ and the same units of measure. This is 
most directly handled by using the same instrument on both occasions. Second, the tests used on 
the two occasions must be valid assessments of achievement at each time. Third, in order to 
interpret the results, the confounding influences that occur between the occasions must be well 
understood. These would include changes in the population of students, changes in the 
curriculum, changes in system supports, etc. The first and second difficulties might act in 
opposition to each other if one of the confounding influences is a change in curriculum. 


An example of an AOT study that illustrates some of the problems that are encountered was 
carried out by Clarke, Nyberg, and Worth (1977). In 1956, these researchers administered 
several ability and achievement tests to all of the children attending the third grade of Edmonton 
Public Schools. Nineteen years later the same tests were administered again to all of the children 
attending grade 3 in that year. Attempts were made to interpret differences in performance. In 
this study, although the tests were identical on the two occasions, their relationship to the 
curriculum was clearly different. Attempts were made to exclude items whose curricular 
relevance was deemed to be questionable, but a study by Blackmore (1980) showed how 
unsuccessful this attempt was. Moreover in the intervening 19 years the demographic 
characteristics of Edmonton had changed so much that it made interpretation of differences 
extremely tentative. Thus to say that the achievement of Edmonton children had changed or not 
would have meant setting aside concerns about both differences in validity of the tests on the two 
occasions and differences in what constitutes “Edmonton children attending grade 3.” Despite 
these criticisms, the study has an attractive rationale. If we want to know how students of today 
would perform on yesterday’s tests, the Clarke design is reasonable to use especially if we must 
be content with retrospective data. 


B. Achievement Over Time in the Alberta Diploma Examination Context — the Approaches 


In the 1983-84 the Alberta Government re-established Diploma Examinations in several 
academic subjects for students finishing their final year of study. Initially, exams in a particular 
subject such as English 30 were administered in January and June of a school year, with an 
examination available in August both for first-time writers and for writers with supplemental 
privileges. The examinations were and continue to be based upon the course of studies and are 
created by panels of teachers who are either currently or recently engaged in teaching the course. 
In all subjects, the examinations are a combination of objectively scored multiple-choice 
questions and extended response questions that require the use of panels of teachers to score the 
responses. . 


While the main motivation behind the Diploma Examination program was to “develop and 
maintain excellence in educational standards by certifying academic achievement” [AOT-1 1], 
since 1989 there has been a desire on the part of Alberta Learning (formerly Alberta Education) 
to answer the question, “ Has achievement as measured by the Diploma Exams changed over the 
past few years?” Several studies comparing AOT have been undertaken by the Learner 
Assessment Branch (formerly the Student Evaluation Branch) of Alberta Learning. These have 


-16- 


been reported in the Annual Reports of the Learner Assessment Branch and other documents, 
and are listed in the references as AOT-I to AOT-11 (my labels). 


As is pointed out in AOT-11, there are several constraints that have been placed on the Diploma 
Examinations that make the assessment of AOT difficult. Principal amongst these is that the tests 
used in January and June are made public after they have been administered. This means that 
none of the questions that have been used in one year can be used in a subsequent years. To get 
around the problem of non-equivalent instruments, three approaches have been taken to studying 


AOT. 


Use of the August administration. According to AOT-1, in Biology 30, a common set 
of 67 multiple choice questions was administered in August of 1988, 1989, and 1991. 
Results were compared for students who wrote for the first time, and were not 
classified as mature students. This approach did not appear after 1991. 


Use of anchor tests to equate Diploma Examination results. The exact procedure for 
using anchor tests to equate results from one year to the next has not been described 
in the School Year Annual Reports. This is unfortunate because it makes it difficult to 
evaluate the credibility of the reported results. The following description is based on 
information provided in AOT-1 through AOT-8.1, and AOT-9. As noted in AOT-1, 
anchor tests are multiple-choice tests designed to be parallel to the multiple-choice 
components of Diploma Exams. They are administered to a sample of students 
approximately 2 weeks prior to the administration of the diploma exam in a particular 
subject. The anchor tests are kept secure and administered in subsequent years. 
Anchor tests were first developed for English 30 and 33 and Social Studies 30 in 
1989. In 1990 they were developed for Biology 30, Chemistry 30, Math 30 and 
Physics 30. For almost all studies, the anchor tests were used to equate the June 
examinations. Results from the administration of the diploma exam are matched with 
the results of the anchor test for students in one year, and again in a subsequent year. 
A procedure called linear equating is then used to map the diploma results from a 
subsequent administration onto the scale of the original or baseline administration. 
There are several approaches to linear equating, and it appears that the procedure 
outlined by Angoff (1982 page 61) was used. 


Use of qualitative procedures to assess achievement over time. One of the guiding 
principles of the Diploma Examination program is that there is a careful attempt to 
produce standards of performance that are stable from one year to the next. The two 
main standards are “Excellent” and “Satisfactory.” One way of examining changes 
over time is to compare the proportions of students who achieve these standards from 
one year to the next. To refine the process, in several studies procedures were used to 
compare the levels of the standards themselves. 


“17- 


C. Results of the Alberta Diploma Examination AOT Anchor Design Studies 


AOT anchor design studies have been carried out since 1991. In some subjects these have 
allowed comparisons to be made from 1989 to 1999. Since many of the problems encountered 
and the results differ among subjects, this section will be organized by subject. 


The general format of the AOT anchor design reports consists of a very brief description of the 
process of equating, a table consisting of the number of students writing the anchor tests, a 
second table showing equated means for each year to the current year, and a statement of which 
years are significantly different from each other. A significance test is used to guide the claims of 
change over time. Although no technical document accompanied the AOT reports, perusal of the 
computer program used to test significance revealed that under suitable assumptions, the test 
calculations should provide appropriate approximations. 


Under the equating system used, the relationship between the equating test and the Diploma 
Examination should remain relatively invariant over time. This is a consequence of the fact that 
the construct being assessed (achievement in ...) is constant over time. One check on this would 
be to examine the homogeneity of regression of the diploma exam results on the equating test. 
So far as I could tell this was not done. 


Another statistical issue worthy of mention is that the anchor design reports on AOT are based 
entirely on the performance of the population mean. With the psychometric technology available 
it would have been possible to equate other values (for example the 10" and 90" percentiles) to 
see if changes occurred differentially along the achievement continuum. 


A further point that must be addressed when considering the overall design is the nature of the 
anchor test sample, their motivation and preparedness for the test, and their consistency from one 
year to the next. It is clear that students prepare for (and are prepared for) high stakes 
examinations, and the amount of preparation that goes into the final few weeks can be 
substantial. How well students prepared for the anchor tests, how motivated their performance 
was, and how carefully they were selected so that the anchor sample was representative of the 
total population are critical questions that are not answered in the documents. 


1. English 30 and English 33. 


From the various AOT reports, it appears that for English 30 and 33, the same anchor test was 
used from 1989 to 1999. Studies reporting AOT for 1990, 1991, and 1992, use 1989 as the 
baseline and all results are mapped onto that scale. Data for these years use the raw score metric 
for reporting. Beginning in 1993, the baseline year was changed to 1992, and the reporting 
metric was changed to the average percentage on the machine scored component. Data for 
English 33 are only reported for the years between 1989 and 1996 (inclusive). 


Studies from 1993 to 1998 use 1992 as the baseline year. In 1999, the AOT-9 study used 1998 as 


the baseline year. Since there are some inconsistencies occurring in AOT-Y, it will be described 
separately. The 1997-98 AOT study (AOT-8.1) provides a cumulative record of the English 30 


-18- 


results from 1989 to 1998. The data shown in Table 5-2 and Figure 5-1 of that document indicate 
that for the first 3 years the mean was in the vicinity of 68.3. The baseline year of 1992 (actual 
data) had a mean of 67.5. From 1993 to 1996, the mean rose steadily to 71.2, fell abruptly to 66.7 
in 1997, and then rose to 71.3. There are two features of the data that are important. The first is 
the general rising trend in results from 1992 to 1998. Guessing that the raw score standard 
deviation would be in the vicinity of 10 points, the general increase from the baseline year 
approaches four tenths of a standard-deviation. Part of this improvement may be due to the 
unusually low value for 1992, and part of it may be due to the possible reduction in validity of 
the equating test over time. However neither of these influences would be sufficient to account 
for the change. AOT-11 notes that the participation rates in English 30 declined during the 
interval and the rates for English 33 rose. If less able students were siphoned from English 30 to 
English 33, then the results would be consistent with the pattern that emerged. There are likely 
other explanations both statistical (see earlier comments on the use of anchor tests and the 
assumptions underlying equating), and pedagogical (for example changes in method and content 
over the time interval). 


The second feature of the data is the low mean for 1997. The 1997 (AOT-7) report says, 
“Although the level of achievement in 1997 is the lowest since the achievement-over-time study 
began, it is only significantly lower than 1996 and 1995. Whether this lower achievement in 
1997 represents a trend will not be known until the results for 1998 can be analyzed” (page 28). 
This is a reasonable response, although it would have been reassuring to know that the sampling 
procedure, the calculations and the administrations of the anchor test and Diploma Examination 
had presented no anomalies. After the 1998 data had been analyzed (AOT-8.1) it was reported, 
“Tt appears that the lower achievement in 1997 was not a trend but perhaps an anomaly’(page 
28). Simply put, this is not an explanation. The whole purpose of a monitoring program is to 
examine trends and exceptions and provide explanations as to why things may be occurring as 
they are. In this instance the analysts failed to complete their task. 


In 2000, the AOT — 9 study examined data for English 30 and Social Studies 30 for 1992 to 1999 
and for Social Studies 33 and Biology 30 for 1998 to 1999. AOT-9 used 1998 as the baseline 
year. According to the procedure, “All data were screened for outliers: extreme cases that do not 
represent the group. Outliers for this study were defined as students whose anchor test scores 
were 20% or more discrepant from their corresponding Diploma Examination score. These 
students’ scores were excluded from the study” (page 1). This would suggest that AOT-9 would 
have fewer students in their anchor samples than in the Annual Reports. This is not always the 
case. For 1992, AOT — 9 has a larger sample (431 vs. 360). Looking at the data, we can compare 
the results with the AOT Annual Reports by plotting the results against each other. This is shown 
in Figure |. While not great, there is a difference in the relative placement of 1996 and 1998. 
When the significance tests are considered, AOT-9 found three significant differences in the 
years from 1992 to 1998 (1992 < 1996, 1997<1996, and 1997<1998). The Annual Reports found 
these three, but in addition, found 1997<1995, and 1992<1998. 


The point of this comparison is not to criticize the work in AOT-9, nor of the analysts who 
produced the 1998 AOT special study. Rather it is to point out that the statistical strategies 
behind AOT studies are not exact measurement issues. Different assumptions, decisions about 
credible data, and other choices that are part of the statistician’s art can influence results. It is 


important to treat the investigation as explorations to describe the trends and exceptions to try to 
establish their robustness, and then to seek plausible explanations. 


The data for English 33 are reported from 1989 until 1996. The baseline year of 1992 is 
unfortunately the year in which only 114 students participated in the anchor study. The equated 
means varied from 63.7 (1991) to 67.1 (1994). Because there were only 146 students in the 
anchor study for 1994, the mean of 67.1 may be associated with large sample error in the 
equating function estimates. As the 1996 AOT Annual Report (AOT-6) states, “ In English 33 
there has been no significant change in the level of student achievement on the machine-scored 
component of the Diploma Examination since 1989.” 


2. Social Studies 30 


The administration pattern for Social Studies 30 is the same as for English 30. The results are 
somewhat different. In the 1991 report (AOT-1), it is noted that, “... there does appear to be a 
change in achievement in Social Studies 30. It is estimated that 1991 students would, on average, 
have been able to answer approximately two more questions that the 1989 students if they had 
written the 1989 Social Studies 30 Diploma Examination”(page 47). The writers go on to 
speculate that, “...the increase in achievement observed on the multiple-choice component of the 
Social Studies 30 examination may be a reflection of students’ improved ability to grapple with 
critical thinking skills. This improvement may be due, in measure, to an increased instructional 
focus on skill development in the classroom” (page 47). A year later (AOT-2), it is reported, 
“Social Studies 30, although not showing a statistically significant difference, continues to hint at 
a slight increase each year. However, the observed differences may have been due to random 
error within the sampling procedure” (page 30). And in the 1993 report (AOT-3), the results are 
reported without any explanation whatsoever. As noted under English 30, it is unfortunate that 
reports since 1992 have treated the results as somehow standing on their own without 
explanation. Data seldom speak for themselves, and the implications of the outcomes need to be 
investigated. 


From 1989 to 1992, the equated scores were expressed in the raw score metric. From 1993 
onward, the percent of total score metric was used for reporting. In the 1992 report (AOT-2), the 
mean score of 48.3 was higher than the equated mean score for the previous year (47.3) relative 
to the baseline of 1989. In 1993 (AOT-3), the mean score for 1992 (65.4) was lower than for 
1991 (66.4). Since the number of students used in the equating studies is the same, this anomaly 
suggests that there was an error in one or other of the calculations. 


A further discrepancy occurs when the 1998 AOT study (AOT-8.1) is compared with AOT-9. 
Even though the 1998 study used 1992 as the base year, and AOT-9 used 1998 as the baseline 
year, the data points should fal] on a straight line. Recalling that AOT-9 tried to make the results 
more accurate by eliminating outliers, it is likely that there would be some variation. But as 
Figure 2 shows, there are great differences between the two approaches. When the question of 
significance is addressed, AOT-8.1 shows 4 significantly different years (92<93, 92<95, 92<98, 
and 94<98), whereas AOT-9 found 5 (92<94, 92<95, 92<96, 92<97, and 92<98). The two 
studies agree only on two significant differences. 


AOT - 9 concludes, “In Social Studies 30, achievement was significantly lower in 1992 
compared with 1994 through 1999. This suggests that student performance improved 3% from 
1992 to 1993 but has changed very little since 1993”*(page 3). Setting aside the dubious statistical 
reasoning (according to their results, 1993 when the change allegedly took place was not 
significantly different from 1992), their claims contrast with those of the authors of the 1998 
study who say, “In Social Studies 30, there has been gradual improvement in students’ 
performance from 1989 to 1998. The 1998 level of achievement is the highest since the 
achievement-over-time study began, significantly higher than in 1989, 1990, 1992, and 1994 
(page 28).” It is clear that different analyses of the same data can lead to different conclusions. 


3. Biology 30 


Although not an anchor test design, as noted in the description of methods, the first AOT study 
for Biology 30 was carried out in August of 1988, 1989, and 1990. The results were based on 67 
multiple-choice questions administered on all three occasions [AOT-1]. On each occasion, 
between 120 and 140 first time, non-mature students responded. The means were between 67.2 
and 68.7, It was concluded that the change in performance was not significant. The analysts 
cautioned that examining the aggregate may be misleading if there is improvement or decline in 
specific skills. This is an important point because the Diploma Examinations are designed to 
assess global achievement. Generally they have too few questions to give a reliable indication of 
level of performance on specific skills. For example, the 67 multiple-choice questions were 
designed to cover 10 major concepts. 


In 1992, the three year AOT anchor study in Biology 30 (AOT-2) showed a steady decline in 
mean raw score: 46.5, 45.9, 45.5. In 1993 (AOT-3) when the baseline was changed to 1992, and 
the percent score metric was used, the corresponding means for the first 3 years were not in the 
same order (67.], 65.7, and 66.6 for 1990, 91, and 92 respectively). In this case the conclusions 
are unaffected since none of the differences is significant. However, the equated performance for 
1994 (AOT-4) was found to be significantly different from the means of 1991 through 1993, and 
from the summaries provided, the influence of metric and baseline choice on the conclusion 
cannot be disentangled. 


In 1995 the curriculum changed for Biology 30, and there was no AOT study for that year. 
Presumably a new anchor test was constructed in 1995, because in 1996 a non-significant 
improvement in mean performance was found (AOT-6). 


The final study was reported in AOT — 9 and compared 1998 and 1999 performance. The anchor 
sample for 1998 was only 140, and the writers of AOT-9 urge caution in interpreting the results. 
For the first time, standard deviations are reported, and although the difference between average 
performance was not significant (1998 mean = 68.9%, standard deviation = 17.8; 1999 equated 
mean = 69.9, standard deviation not reported) we can see that the effect size for a change of | 
percent is very small. One- percent change in average performance corresponds to about one 
twentieth of a standard deviation. This is important, because it illustrates one of the main 
problems with the Alberta Learning AOT studies. Even significant changes in average equated 


performance correspond to effect sizes that may be less than one or two items. The use of effect 
sizes to describe differences would provide readers with a basis for determining how important . 
the changes are. 


4. Chemistry 30 


For Chemistry 30, AOT studies were carried out from 1990 to 1996. In AOT-6, the trend was 
summarized as, “In Chemistry 30, the 1996 results were significantly higher than the 1991 and 
1994 results.” For the earlier studies, 1990 was used as the baseline and there were no important 
differences among equated raw score means from 1990 to 1992. Once again in 1993, the 
reporting metric was changed to percent scores and the baseline year was changed to 1992 to be 
consistent with all of the other subjects (AOT-3). As in other cases this change produced an 
anomaly in the scores. Under the raw score metric the equated mean for 1992 was greater than 
the actual mean for 1990 (34.6 vs. 33.1). Under the new metric the order was reversed (67.5 vs. 
68.7). 


In 1994 (AOT-4), the mean equated score of 65.5 was said to be significantly lower than the base 
year of 1992 (mean = 67.5). However it was not significantly lower than the equated mean for 
1990 (68.7) which was even higher than 1992. This strange occurrence was made even stranger 
in 1995 when the conclusion was reached that there were no significant differences since 1990! 
In other words a difference that was significant one year became non-significant a year later. A 
technical document would have explained these contradictions, but for the interested professional 
reader, dependence on statistical tests to determine important differences is problematic. 


In 1995 the curriculum for Chemistry 30 (and for Mathematics 30 and Physics 30) changed, and 
a new anchor test was constructed “containing sufficient items from the old anchor and the new 
curriculum to bridge performance before !995 to performance in 1995 and after” (AOT-5 page 
31). This creates an interesting problem. If a revised anchor test were constructed, and the 
baseline remained as 1992, then it is difficult to understand why the equated means for the time 
prior 1995 remained the same in the 1995 study (AOT-5) and the 1994 study (AOT-4). What 
must have happened is that the new anchor test was used to equate 1995 to 1992, (and Jater 1996 
to 1992) whereas the old anchor test was used to equate 1994 and 1993 to 1992. This means that 
the underlying continuum from 1995 onward is different from the continuum prior to 1995. No 
acknowledgement is made of this, and in 1996 the conclusion is made that the mean for 1996 
was significantly different from 1994 and 1991. Given that 1996 was measuring something 
different (after equating) than 1994 and 1991, it is difficult to see how the conclusion has 
credibility. To make the claim, all of the years from 1991 to 1996 should have been equated 
using the new anchor test. 


5. Physics 30 
AOT-6 summarized the trend for Physics: “In Physics 30, the 1996 results were significantly 


better than the results for 1990, 1992, 1993, and 1994. In addition, the results for 1994 were 
significantly lower than the results for 1992, 1993, and 1995.” 


i) 
S 


The detailed comments for Physics 30 are very similar to Chemistry 30, except that in 1991 no 
anchor test was administered. As with Chemistry 30, the curriculum was changed in 1995, and 
the anchor test was adjusted accordingly. Once again the claims that 1996 results were 
significantly higher than the results for 1994 and previous years, and that 1995 was higher than 
1994 are difficult to sustain since the achievement metric being assessed changed during the 
1995-year. 


6. Mathematics 30 


The first anchor test for Mathematics 30 was administered in 1990. No anchor test was 
administered in 1991, and in 1992 because of a change in curricular emphasis, only the part of 
the anchor test covering the topics of polynomial functions and trigonometric functions could be 
used. The 1993 report dropped comparisons prior to 1992 because of these constraints. 
Presumably a new anchor test was developed, but that is not stated. 


In Mathematics 30, the problem in curricular change occurred again in 1995 and it appears to 
have been dealt with in the same way as in Chemistry 30 and Physics 30. In 1996, the claim is 
made that student achievement in Mathematics 30 has “shown significant improvement each 
year” in spite of the fact that the Table 6-2 (AOT-6) does not list the 1996 to 1995 difference as 
being significant. 


D. Results of the Qualitative Studies 
|. English and Social Studies 


Since 1990 there have been three major qualitative studies of English 30 and Social Studies 30. 
Two of these have also involved English 33. The general design for studying achievement over 
time is outlined in AOT — 10 (1991). The Learner Assessment Branch routinely saves 1000 
papers from each administration of the Diploma Examinations to be used for subsequent research 
and development. The qualitative studies use this pool to draw papers from a previous 
administration to compare with papers from the current administration. Essentially the technique 
focuses on two levels of achievement: Excellent and Satisfactory. Essays in the Diploma 
Examinations in English and Social studies are scored on a five-point scale with a rating of 3 
being Satisfactory, and a rating of 5 being Excellent. Essays were chosen from 1984 and 1990 
that had been rated at these levels, and a panel of ten teachers in each of the three subject areas 
was asked to read the papers and provide written characterizations of them along three 
dimensions: thoughtfulness, effectiveness, and correctness. The characterizations were clustered 
to develop portrayals of essays rated 3 and 5 for both 1984 and 1990 in the three subject areas of 
English 30, English 33, and Social Studies 30. Two questions guided the study: (i) Did students 
who wrote Diploma Examinations in June 1990 produce better compositions than did their 1984 
counterparts? (it) Have the standards and expectations for written responses at the Satisfactory 
(3) and Excellent (5) levels of performance changed since 1984? 


It was concluded that in all three subjects the level of expectations embedded in the scoring 
criteria had increased substantially from 1984 to 1990. In addition to this it was also found that 


papers given a rating of 3 (Satisfactory) were significantly better in all three subjects in 1990 
than in 1984, and this also occurred at the Excellent level. [Note: the term significant was used in 
its pedagogical sense. ] 


The biggest problem with the AOT-10 report is that it does not describe the exact selection 
strategy employed, so that the number of essays that were sampled in each year for each level, is 
not provided. Moreover whether the papers used were a random sample of papers at a particular 
level, or were papers purposefully selected as pristine examples of papers at the Satisfactory and 
Excellent levels cannot be determined from the report. Because of this, it impossible to place a 
value on the evidence presented. If the study is based on two or three hand picked papers for 
each cell, it is hard to give any credence to the conclusions. If it is based on 30 papers randomly 
selected to represent each cell, then the claims would be very compelling, and it would be fair to 
say that performance has improved. 


In spite of the criticism expressed above, this method is an excellent strategy for assessing the 
underlying questions of interest. Are students at one time writing better essays than students at a 
previous time? 


At the end of the 1991 school year a second study was carried out using the AOT-10 method. 
This study (AOT-1) compared the major assignment in English 30 for June 1989 with that of 
June 1991; the argumentative essay for Social Studies 30 for June 1987 with that of June 1991; 
and functional writing in English 33 June 1984 with June 1991. These comparisons were 
selected in an attempt to make the actual writing assignments as similar as possible. For English 
30 at the Satisfactory level there were no great differences between the sets of papers, but at the 
Excellent level, a “dramatic increase in achievement in the areas of thoughtfulness and 
effectiveness took place between June 1989 and June 1991 (page 48).” For Social Studies 30 
improvement was noted at both levels, but raters noted that many of the 1987 papers that 
received a score of 3 in 1987 would have been downgraded to 2 in 1991. The papers for English 
33 were better in 199] than in 1984 at both levels. Once again in the 1991 School Report study, 
no details on the numbers of papers used in the comparison is given. 


In 1997 (AOT-7), the AOT-10 method was again applied, comparing English 30 and Social 
Studies 30 for the interval 1992 to 1997. In this report it is stated that the total sample of papers 
selected for each of the two years was 1000. However it appears that the 1000 papers refers to 
the total pool from which the studied papers were drawn, and not the number of papers that were 
analyzed. The latter value is not known. 


The first question addressed in the study was to see if the standards that were applied for scores 
of 3 and 5 were the same across years. After examining the scoring criteria, assignment wording 
and the representative essays, it was concluded that the standards were consistent across years. 
Given that the standards were deemed to be the same, attention was turned to the number of 
students achieving those standards. In English 30, more students achieved a rating of Excellent 
in 1997 than in 1992. The proportions according to the figures presented in the report are 
approximately 14% vs. 6%. For Social Studies the 1997 performance exceeds the 1992 
performance at both levels. 


A second part of the 1997 study (AOT-7 page 31) reported on comparisons of students achieving 
Satisfactory and Excellent standards for all of the years from 1990 to 1997. All of the students 
writing in a school year are combined to produce the data. To make the comparison, the 
assumption is made that the standards have remained the same. Apart from the information 
provided for the two end years (1992 and 1997), no justification is given for assuming that the 
standards were equal in the intervening years. For this reason the comparisons have little to add 
to the AOT question. 


2. Biology 30 


In 1998 (AOT- 8.2) a “Special Study” compared the extended written response question for 
Biology 30 June 1995 and June 1998. In Biology 30, the essay is marked on a 4-point scale. A 
score of 2 is Acceptable, and a score of 4 is Excellent. A panel of 8 readers examined 16 papers 
at the Acceptable level and a second panel of 8 examined 16 papers at the Excellent level. At 
both levels the papers had been selected to represent the mid range of performance. The readers 
examined the papers for thoughtfulness, complexity and correctness. The process was carried out 
for both years. The conclusions are based on only 8 papers per cell. 


Four questions guided the study: (1) Did the two standards change over time? (2) How were the 
questions themselves different on the two occasions? (3) What effect does the extended written 
response question had on student achievement over time? (4) Has student achievement at the two 
levels changed over time? 


Generally speaking it was found that the Acceptable standard had changed slightly so that more 
biology content was necessary for an Acceptable response in 1998. At the level of Excellent, 
student responses required greater depth, more complexity and sophistication in 1998 than in 
1995. Unfortunately the question used in 1998 was found to be a better question than the 
question used in 1995. It encouraged the students to use higher level content and to structure 
their answers in a better fashion. The percentage of students achieving both standards increased 
from 1995 to 1998. 


The 1998 Biology study shows that conducting qualitative AOT studies is not easy. By 
addressing the quality of the questions directly, the analysts have shown how difficult it is to 
isolate the change in performance from confounding factors. It is not surprising that the quality 
of examinations is improving, the examination developers (practicing teachers) use their 
experience and feedback from previous administrations to make the questions ‘better.’ Although 
for studying AOT, this has the unfortunate consequence of making difficult to disentangle 
changes in typical achievement from improvements in psychometric methods, from a 
psychometric perspective it is desirable. 


3. Mathematics 30 


In 2000, a different method was used to study AOT in mathematics (AOT-12). This method 
explored the difference in expectations, standards and marking procedures used for the written 
response questions in Mathematics 30 in June 1991 and June 2000. The researchers examined 
the curricula as defined in 1991 (one year prior to the introduction of a new mathematics 
curriculum), and again in 2000. In addition they compared statements of the standard of 
Excellence and the Acceptable standard in the two years. 


As one part of the study, 100 student papers were selected from the pool of papers saved in 1991 
that had total scores falling between 50% and 60%. An additional 100 papers were selected that 
had total scores between 80% and 90%. Using similar total score bands, two sets of 100 papers 
each were selected from the 2000 administration. Alberta Learning defines students whose 
papers fall within these two bands as meeting the overall standards of Acceptability and 
Excellence. In comparing this work with other studies it is important to note that in this study, 
the placement of students is based on their overall performance on the written and multiple 
choice examinations, whereas in the other studies, the comparisons were made between what 
constituted a Satisfactory (or Excellent) performance on a specific question in one year and what 
was required for that rating in a subsequent year. 


Based on the comparison of questions, scoring guides, curriculum outlines, and student 
responses, the study concluded that the expectations of students had changed. In comparison to 
students in 1991], the students of 2000 were required to interpret questions within a more total 
mathematics context, they had to create their own procedures rather than applying particular 
steps, and they had to demonstrate a higher level of mathematical knowledge and problem 
solving in order to receive high scores. 


The guides used for scoring the written responses in 2000 appeared to be more flexible than 
those in 1991, thus making it possible for students to receive partial credit for their work. Partly 
as a consequence of this the score distributions for “Acceptable” students appeared to be higher 
in 2000 than in 19991. Of course since the questions themselves were different, it is difficult to 
know how important this effect was. The direction was reversed for students at the “Excellent” 
level. The general observation made by the authors was that students in 2000 tended to be 
‘rewarded’ for what they knew, whereas students in 1991 tended to be ‘punished’ for what they 
did not know. 


In general, the study documents changes both in the way that the Mathematics 30 curriculum is 
conceived, and in the way that students are assessed. It has relatively little to say about the 
specific question of how the performance of students of 1991 would compare with the 
performance of current students. 


4. Mathematics 30, Chemistry 30, Biology 30 and Physics 30 (The 1991 study) 


The 199] School year Annual Report (AOT-1) refers to the mathematics/sciences study — 
multiple choice and written response. (Pages 50 — 53). None of these studies involves the 


I6s 


specific examination of student papers. Rather they record the views and impressions of 
members of the student evaluation branch. As such they provide interesting anecdotal evidence 
of how the changes in examination format seem to relate to changes in general student 
performance. Since there is no way of checking these opinions against data, the study will not be 
reviewed. 


E. Discussion of AOT 


What general conclusions can be drawn from the various studies that have been conducted? 
Overall, when they have been used, the qualitative studies have usually shown some 
improvement over time. However, there have been relatively few of these studies, and they have 
been not been carried out across a broad range of subjects. The anchor test studies occasionally 
show significant positive changes, but the differences are scattered and in the few instances 
where they can be cross-referenced with the qualitative studies they do not show much 
agreement. As noted in Section B, achievement is a broad construct, and it should not be 
surprising that different ways of assessing it lead to different conclusions. 


AOT-11 tries to clarify the problems underlying AOT studies of Diploma Examination results. In 
AOT-11 it is noted that the original basis for standards and scaling as set out in 1983, produced 
an incompatible mixture of norm and criterion referencing in the setting of standards of 
achievement. In addition, the guidelines that are intended to encourage high standards and 
improve instruction make it almost impossible to produce high quality AOT studies. Finally, 
there are several problems inherent in long term AOT studies: population differences, curricular 
differences, and instrument effects. In the second part of AOT-11, 9 ways in which AOT studies 
could be conducted in the Diploma Examination program are listed. This analysis is conservative 
in that it begins with the current constraints and looks at alternatives within those constraints. 
Had it begun with the question, how might AOT be carried out in “30-level” subject areas, then 
it might have proposed some more creative approaches to the problem. In any event the work 
found in AOT-11 does point out that professionals within the Learner Assessment Branch are 
aware of the shortcomings in the AOT program. 


The review of the AOT studies carried out in the present paper has been based on the brief 
(usually 3 or 4 pages) reports that are available in annual reports. No attempt has been made to 
look at the original data, to look closely at the techniques used, or to interview analysts. 
Nevertheless examination of the public documents is worthwhile if it speaks to the question of 
whether the information provided gives a valid portrayal of the situation. With respect to the 
present task, it is better to think of this review as professional opinion as bolstered by reasoned 
argument, rather than assessment against an agreed upon standard. 


The anchor studies are problematic for all of the reasons described above. The question that must 
be addressed: Is the information that they provide better than no information at all? Not 
surprisingly, the answer is yes and no. Given the changes in population, curricular emphases and 
instructional support, the useful lifespan of an anchor test is almost certainly not 10 years. If the 
anchor test method is to be retained, a moving series of 3 or 4 years makes sense. Anytime there 
is an obvious change in situation (curriculum, population, etc.) then the process should be 
restarted. The current approach to test equating depends on student motivation. The anecdotal 


-27- 


evidence is that students are not highly motivated to perform on the anchor tests. If this is true, it 
would be important to explore alternatives that would encourage them to perform at their highest 
level. If students are motivated to achieve on carefully constructed anchor tests, the equating 
approach for assessing short-term AOT is very efficient. 


This is not to say that achievement over longer periods is not important; the present anchor test 
method is simply inadequate to the task. It would be better to focus on specific areas to see if 
there has been improvement. I am impressed with the objectives and methods used in the 
qualitative studies. Rather than trying to talk about English achievement, or Biology 
achievement, these studies narrow the investigation to look at changes in specific aspects of 
essay writing. It is true that they fall into difficulty when the examination question in one year 1s 
demonstrably poorer than in a later year, but while that is a problem for making attributions 
concerning AOT, it is an asset from the point of view of improving the Diploma Examinations. 
The specific method employed in these studies has been to focus on two levels. While some 
might argue that this approach confounds differences in standard with differences in 
performance, it is my view that comparing “Excellent” essays from 199X with “Excellent” 
essays in 200X provides interesting information to administrators and others interested in public 
school education. In a responsive system, standards ought to change over time to meet the 
changing demands of the society. The qualitative analyses tell us about the nature of those 
changes, and how successfully students are meeting the changing standards. 


The process can be applied directly to the extended response sections of other subjects, but it 
could also be adapted to multiple-choice sections. For example, one could focus on two Diploma 
Examinations separated by 5 years and pick a specific content area (e.g. polynomial functions in 
mathematics), and look at the items (and their statistics) that fall within this small area. A 
qualitative analysis of the intellectual demands made by the questions, and the performances of 
the students in terms of correct responses made and kinds of erroneous alternatives selected 
would give a very interesting picture of changes in achievement. 


To make the AOT studies useful, it is not sufficient to treat the data as self-explanatory. It is 
essential that interpretations be proposed and discussed by those with the expertise to do so. For 
this purpose test development and standard setting committees are ideal. To carry out the task, 
there is a need to provide or collect explanatory information. For example, AOT-11 used 
participation data to try to explain changes in performance. International studies have made use 
of questionnaires administered to teachers on various aspects of opportunity to learn. If Alberta 
Learning is interested in studying AOT, it must improve the design. I believe that this can be 
done without compromising the pedagogical integrity of the Diploma Examinations. 


Il. National and International Studies of Educational Achievement 

A. Introduction 

Alberta has participated in several national and international studies of educational achievement. 
Of interest in the present analysis are the various SAIP (School Achievement Indicators 


Program) studies carried out by the Council of Ministers of Education: Science — 1996, 
Mathematics — 1997,and Reading and Writing -1998, and TIMSS (The Third International 


Mathematics and Science Study). In addition the 2000 Canadian Education Statistics Council (a 
combined effort of the Council of Ministers of Education and Statistics Canada) has collected 
achievement outcomes from the SAIP and TIMSS studies and combined them with several other 
educational indicators to produce, “Educational Indicators in Canada.” 


B. Alberta’s Performance in Recent SAIP studies. 


Three reports were examined, the 1996 Science Assessment, 1997 Mathematics Assessment, and 
the 1998 Reading and Writing Assessment. In creating the assessments, test developers began 
with a theory that underlies short-term development (roughly from 13 years of age to 16 years of 
age) in each of the areas. While it is true that other continua of expertise would be defensible, the 
ones described are consistent with current best thought in curriculum development and are well 
aligned with the curricula across Canada. Within each continuum, the test developers located 
milestones or levels (1 = low and 5 = high). The location of these levels is conceptual, that is 
they are describable in terms of cognitive development and performance, and each level 
represents an important improvement in sophistication from the previous one. They are carefully 
explicated and lead directly to assessment tasks. In many (perhaps most) cases, tasks are 
presented to students that allow them to respond at whatever their level of capability. In some 
cases (e.g. some of the multiple choice items), tasks are specific to a level. While it is true that 
slippage is possible between level description and scoring, a great deal of attention has been 
directed toward making these assessments truly criterion referenced. This implies that the 
performance of students is located on a pedagogically meaningful scale. Moreover, taken as a 
model for assessment the SAIP approach shows teachers how good assessments that trace the 
path of growth for individual students can be constructed. 


The results of the studies are discussed below. 


1. 1996 Science - According to the report, “Alberta’s 13- and 16-year olds demonstrated 
levels of achievement in the written test that were significantly higher than those of 
Canada. The cumulative percentages of both ages groups at levels 4 and 5 are higher than 
those of all other jurisdictions (page 46).” | Examination of the graphs reveals that in 
spite of what the report claims, the confidence intervals for Yukon overlap those of 
Alberta for both level 4 age 13, and level 5 age 16.}] Having summarized the 
performance, and having provided a description of the Alberta science curriculum for 
junior high and high school, the document does not go forward and indicate what features 
of the Alberta system might have lead to this excellent performance. 


2. 1997 Mathematics — The mathematics assessment carried out in 1997 was composed of 
two parts: mathematics content and problem solving. The results were summarized as, 
‘For mathematics content, there is no significant difference between this jurisdiction’s 
[Alberta] performance and Canadian performance at all levels, except for levels | and 2 
for 13-year-olds (page 42).” “For mathematics problem solving, there are significant 
differences between this jurisdiction’s performance and Canadian performance at levels 
1,2, and 3 for 13-year-olds, and at level 3 for 16-year-olds (page 46).” The structure of 
the assessment allowed comparisons with a similar assessment carried out in 1993. In the 


content section, at levels 4 and 5, students of Alberta showed very little change from 
1993 to 1997, but in the four common problems used in the problem solving section, 
performance in 1997 was better than in 1993. 


Comparison between jurisdictions were not as explicit as in the 1996 Science study, but 
an examination of the jurisdictional plots shows that Alberta generally falls behind 
Quebec at both age levels and for performance levels 4 and 5, but is ahead of all other 
jurisdictions. The main comparison presented is between jurisdictions and the Canadian 
performance. At the upper levels (4 and 5) in content and problem solving, Alberta is 
above the Canadian values at both age 13 and 16, but the differences are seldom 
significant. As in the 1996 Science assessment no use was made of auxiliary information 
to provide suggestions for improvement. 


1998 Reading and Writing - As the title suggests, the 1998 assessment was in two parts: 
reading and writing. The 1998 report summarizes the results as, “In this jurisdiction, there 
are no significant differences at any level between students’ achievement in reading and 
the English Canadian performance at either age. Just over three-quarters of 13-year-olds 
can interpret, evaluate, and explore surface and directly implied meanings in 
straightforward and some complex texts. About two-thirds of 16-year-olds achieve higher 
performance, demonstrating skills in developing complex meanings in complex texts and 
surface meaning in some sophisticated texts (page 45).” “In writing, the percentage of 
Alberta 13-year-olds reaching levels 4 and 5 is above the English Canadian performance. 
Levels 1-3 are not statistically different from English Canadian performance. Virtually all 
13-year-olds have at least some grasp of the elements of writing. Their writing has 
functional development and integration. It conveys a clear appropriate perspective. Errors 
are minor and do not interfere with meaning (page 46).” 


A previous assessment in 1994 allows for some overtime comparisons. Given that the 
levels are defined in terms of skills, these comparisons can be made without equating. At 
levels 4 and 5, performance declined from 1994 to 1998 for both 13 and 16-year-old 
students. The trend was exactly the opposite for writing where an improvement in the 
numbers of students performing at levels 4 and 5 rose at both age levels. 


Comparisons with the aggregated Canadian results showed Alberta to have significantly 
higher performance at levels 4 and 5 for writing at age 13, but not at age 16. The results 
for reading showed no signiticant differences at either of the top two levels for either age 


group. 

When comparisons are made with other jurisdictions, the Alberta performance in reading 
is undistinguished. In writing the competitive picture is much better particularly amongst 
13-year-old students where Alberta leads all other jurisdictions in proportions of students 
at level 4 or above. 


Once again, no attempt was made to account for strengths or weaknesses. 


-30- 


What is to be made of Alberta’s results in the various SAIP endeavors? There is some comfort 
that in most cases, Alberta is above the national average in most of the subject areas that have 
been covered. Moreover, the detailed scoring algorithms are excellent models for student 
assessment, but the fact that there is little in the way of analytically based interpretation to guide 
curriculum and instructional reform is regrettable. As noted earlier, data seldom stand on their 
own. In light of this, it is easy to see how readers of SAIP reports may be unable to answer the 
question that prompts the activity, “How well are our schools preparing students for a global 
economy and for lifelong learning ? (SAIP — 96 — Science, page 5).” 


C. Performance of Alberta Students on the Third International Mathematics and Science Study 
(TIMSS) 


Samples of students from Alberta schools participated in TIMSS. That study was notable for the 
careful attention that was paid to the quality of the assessment tasks.. An achievement test was 
developed by a committee of curriculum experts representing several jurisdictions (provinces, 
countries) to measure important but common course objectives. Attempts were made to develop 
a test that was fair to all of the participants. Samples of students representing the target grades or 
ages were selected, and the test was administered. Jurisdictional averages were calculated and 
the results were presented by jurisdiction. According to the Government of Alberta News 
Release of June 11, 1997, “Alberta’s grade 4 students outranked all other English speaking 
participants... Alberta students achieved third in science and seventh in mathematics, compared 
to students in 26 countries and provinces taking part in the study.” In February 24, 1998, the 
results for grade 12 students were reported as, “... Alberta students achieved the third highest 
score in science literacy and the fifth highest score in mathematics literacy, compared to students 
in the 24 countries and provinces taking part in the study.” 


Although press releases give a necessarily brief summary, it is possible to garner exceedingly 
useful information from the original documents and technical reports. TIMSS was much more 
than a simple interjurisdictional race. In addition to the carefully crafted tests, questionnaires on 
attitudes, characteristics of teaching, and learning backgrounds were distributed to students, 
teachers and administrators. An analysis was made of over 1000 mathematics and science 
textbooks used throughout the participating countries. Videotapes of between 50 and 100 
mathematics classes in each of Germany, Japan, and United States were made and analyzed, and 
there were case studies of educational policies and practices in the same three countries. In all, 
the non-test part of TIMSS provided a rich source of material for explanation and informed 
speculation. It provides a model upon which national and international studies can build. A 
particularly useful document was produced by the Committee on Science Education K-12 and 
the Mathematical Sciences Education Board of the National Research Council in the United 
States which has the subtitle, “Using TIMSS to Improve U. S. Mathematics and Science 
Education” [See National Research Council 1999]. Within the document the writers highlight 
the findings of the various facets of the studies, provide some clarification and integration of 
findings, and in doing so lay out a basis for improvement. Macnab (2000) describes how some 
countries used the results to change their programs. 


az 


III Conclusions 


The achievement over time anchor test studies provide little evidence to support a claim of 
pedagogically important differences (either positive or negative) in average achievement in 
Diploma Examination subjects over time. Part of the problem lies with the psychometric 
strategies employed, but it is likely also the case that differences in global (i.e. wide range of 
content, wide range of students) achievement are robust in relation to the particular influences 
that are likely to change performance at the individual and classroom levels. 


National studies indicate that Alberta students are generally competitive with their counterparts 
in other provinces, but the reports do not in themselves offer much of a basis for change or 
improvement. 


1V Recommendations 


Although some of these recommendations go beyond the data, it is important to risk criticism in 
the hope that debate is generated. Ten recommendations and brief explanations are shown below. 


Recommendations: 

1. The Learner Assessment Branch of Alberta Learning should, as a matter of course, create a 
technical manual that describes exactly the processes that are used for all of their 
psychometric activities. The professionals who work in the Student Evaluation Branch are as 
competent a group as can be found anywhere, and there is no reason to distrust their 
judgment. Moreover they are extremely busy with their regular tasks. Nevertheless for all 
sorts of reasons, legal, operational, and professional it is very important to provide an 
accurate trace of the various analytical processes that are used to create the reports upon 
which government policy might depend. 


2. The anchor test approach to studying achievement over time is an excellent procedure to use, 
but in the environment of the Diploma Examination program, it is difficult to do well. If the 
existing constraints are to be maintained, special studies must be designed so that student 
motivation on the anchor test approximates motivation on the corresponding Diploma 
Examination. The specific procedure for obtaining the anchor test sample, and the way in 
which anchor test data are collected do not appear in the annual reports. Comments in other 
reports, and informal statements by people connected with the branch suggest that the level 
of commitment displayed by students in the anchor sample may not be equivalent to that of 
students under Diploma Examination conditions. If this is true it calls into question the 
credibility of the equating process. 


3. The anchor test design should restrict the use of a specific anchor test to three or four years. 
It is difficult to believe that the validity of anchor tests remains constant over a ten-year 
period. Indeed, this is acknowledged when significant curriculum change has occurred. To 
be sure that the anchor test behaves in a consistent manner, it is important to adjust it to 
model changing emphases in the Diploma Examination. This will entail a more complex 
overlapping approach to equating. 


-32- 


10. 


Changes in performance should be reported for upper and lower students (e.g. 10" and 90" 
percentile) as well as equated means. Some of the most interesting differences may appear at 
the ends of the achievement continuum. These should be documented. 


Differences in achievement over time should be expressed in standardized units (effect sizes) 
and not as significance levels. As noted in at least one of the reports, a significant difference 
may reflect a difference of only one or two items. To confront readers with the pedagogically 
relevant magnitude of change, the effect size metric would be far more useful. 


The credibility of assumptions required by the anchor test design should be examined 
regularly. At the simplest, this would involve examination of the regression plots of diploma 
results on anchor test and comparing them over time. Equally useful would be to compare the 
distributions of diploma exam results for the anchor test sample with those of the total group. 
It would be reassuring to find that the anchor test group was an adequate, representative 
sample of the total group. 


If long-range broad based achievement over time students are desired [for example studies of 
Physics 30 over 10 years] special studies should be designed and implemented. See for 
example Clarke, Nyberg and Worth (1977). Although the study had many flaws, it does 
address the simple question in a direct manner. Of course the problem of motivation does 
arise and suitable incentives would need to be used to overcome it. 


Specific aspects of achievement over time (e.g. specific reporting skills, problems solving, 
and laboratory investigations) should be studied by carefully designed studies. The 
qualitative studies described in the body of the report are good examples, but quantitatively 
oriented studies could also be undertaken using specifically focused tests administered to 
representative samples of students spanning an interesting time interval. Other approaches 
might involve prospective studies that are carried out in anticipation of curriculum reform. 
Assessments could be developed that focus on particular characteristics of the reform, and a 
simple pre reform post reform design used to assess the change of interest. 


Factors that are known to influence achievement over time should be investigated as part of 
the AOT design so that explanations for differences can be proposed. [The Alberta Learning 
website could offer opportunity for commentary and debate]. These explanations might also 
prompt research into the issues of interest. Alberta Learning could fund important projects, 
but individual jurisdictions could examine local questions. 


Alberta Learning should only participate in national and international projects that are 
designed to provide explanations of the results. Aspects of TIMSS provide good examples. 


-33- 


Reports Relating to Achievement over Time 

AOT-1 1990-91 Schoo! Year Annual Report Section 6 
AOT-2 1991-92 School Year Annual Report Section 6 
AOT-3 1992-93 School Year Annual Report Section 6 
AOT-4 1993-94 School Year Annual Report Section 6 
AOT-S5 1994-95 School Year Annual Report Section 6 
AOT-6 1995-96 School Year Annual Report Section 6 


AOT-7 1996-97 School Year Annual Report Section 5 Special Study: English 30 and Social 
Studies 30 Diploma Examinations 


AOT-8.1 1997-98 School Year Annual Report Section 5 Special Study: Achievement over time 
for the multiple-choice section of English 30 and social studies 30 Examinations 1989- 
1998 


AOT-8.2 1997-98 School Year Annual Report Section 6 Special Study: Achievement-Over- 
Time for the Biology 30 Extended Written Response Question (June 1995 and June 1998 
Comparison) 


AOT-9 Learner Assessment Branch Paper 2000, Special Study: Achievement-over-time results 
on the machine-scored component of selected Diploma Examinations. (Original version 


Jan 12, 2000, Revised version undated). 


AOT-10 Learner Assessment Branch Report 1991: Patterns and Processes Approaches to 
writing by grade |2 students. 


AOT-11 Learner Assessment Branch Paper, April 6, 1999: Comparing Diploma Examination 
Results over Time. Mimeo. 


AOT-12 Special Study. Achievement-over-time for Mathematics 30 Written-Response 
Questions (June 1991 and June 2000 Comparison). Draft August 2000. 


34. 


National Studies 


Council of Ministers of Education: Schoo! Achievement Indicators Program (SAIP) — 1996 
Science 


Council of Ministers of Education: School Achievement Indicators Program (SAIP) — 1997 
Mathematics 


Council of Ministers of Education: School Achievement Indicators Program (SAIP) — 1998 
Reading and Writing 


2000 Canadian Education Statistics Council (2000). Report of the Pan-Can Indicators Program 
1999 


Other References. 


Angoff, W. H. (1971) Scales, norms and equivalent scores. In R. L. Thorndike (Ed.) Educational 
Measurement (2" Edition pages 508-600) Washington DC: American Council on 
Education. 


Angoff, W. H. (1982) Summary and derivation of equating methods used at ETS. In P. W. 
Holland and D. B. Rubin (Eds.) Test Equating (pages 55 — 69) New York: Academic Press. 


Blackmore, D. E. (1980) A Latent Trait Study of Item Bias and Achievement Differences, Ph.D. 
Thesis, University of Alberta. 


Clarke, S.C.T., Nyberg, V., and Worth, W. H. (1977) Edmonton Grade III Achievement Study 
1956-1971 Comparison. Edmonton: Alberta Education. 


Macnab, D. S. (2000) Forces for change in mathematics education: The case for TIMSS. 
Education Policy Analysis Archives, 8, 15. (An electronic journal: 
http://epaa.asu.edu/epaa) 


National Research Council (1999) Global perspectives for local action. Using TIMSS to improve 
U.S. Mathematics and Science Education. Washington DC: National Academy Press. 


-35- 


ENG30 AOT-9 


SOC30 AOT-9 


Figure | 
A Comparison of Equated Means 1990-98 English 30 
Alberta Education School Report 1998 vs. AOT - 9 


Alberta Ed. vs. AOT-9 (English) 


66 67 68 69 70 71 


ENG30AE 


Figure 2 
A Comparison of Equated Means 1992 — 1998 
Alberta Education School Report 1998 vs. AOT - 9 


Alberta Ed. vs. AOT-9 (Social Studies) 


65.0 65.5 66.0 66.5 67.0 67.5 68.0 


SOC30AE 


72 


68.5 


Appendix 2-1 


SPECIAL STUDY 


Achievement-Over-Time 
for Mathematics 30 Written-Response Questions 
(June 199] and June 2000 Comparison) 


Alberta 


LEARNING 


ve 


Introduction 


As part of Alberta Learning’s commitment to improving student achievement in mathematics, a 
working group was established to consider student achievement in Mathematics 30 over time. In 
particular, the working group, which met in July 2000, examined student achievement over time 
on the extended written-response questions of the Mathematics 30 Diploma Examination. 
Specifically, the purpose of the study was to respond to the question: 


1. In what ways have the expectations of what students have to do changed from 1991 to 
2000 as related to the mathematical content and processes outlined by NCTM and as 
demonstrated on the written-response questions on the diploma examinations? 


Also posed to the working group were the following questions: 


2. In what ways have the standards for mathematics examinations changed from 1991 to 
2000? 
3. In what ways has the marking of diploma examinations changed from 199] to 2000? 


Although the working group was charged with answering the above questions, they were asked 
to do so in qualitative terms rather than quantitative ones; hence, the group asked themselves the 
questions: 


4. — In what ways have the standards changed? 

5. In what ways has the marking changed? 

6. = In what ways have written-response questions at the acceptable standard and standard of 
excellence changed? 


Responses to these questions make up the body of this report. Any claims that are made are 
done so based on the professional interpretation and judgement of the working group. 


.39- 


Procedure 


In order to prepare for the task of analyzing and interpreting the questions and responses to the 
written-response questions for 1991 to 2000, five experienced mathematics teachers and two 
post-secondary professors considered the Program of Studies for Mathematics 30 for 1991 and 
2000, the NCTM standards for 1991 and 2000, and the standards for acceptable and excellent as 
explained in the curriculum standards document. After much discussion on how standards have 
remained the same and how they have changed, the three written-response questions and their 
corresponding scoring guides for the June 1991 and the June 2000 diploma examinations were 
examined. Finally, student examination papers were studied. Special attention was paid to the 
way in which students at the acceptable standard and students at the standard of excellence 
responded to the questions on the two examinations. 


Papers at the acceptable standard that were studied had all received between 50% and 60%, and 
those at the standard of excellence were all between 80% and 90%. 


At each of the acceptable standard and the standard of excellence, 100 papers were selected for 
each of the years 1991 and 2000. The group first looked at 100 papers at the acceptable 
standard from 1991 and listed characteristics of the students’ solutions. The group then did the 
same with 100 papers at the acceptable standard from 2000. The three questions and the student 
responses from each examination were looked at individually and similarities and differences 
were noted. The same procedure was followed with papers at the standard of excellence. The 
group looked at actual student work on the examination, as well as at statistical data supplied by 
Alberta Learning. 


Observations 


Contents and Standards 

The year 1991 was the last year of the previous curriculum. Diploma examinations reflected a 
new curriculum introduced in 1992. Therefore, expectations for students changed from 1991 to 
2000 in terms of course content for Mathematics 30. The logarithms and polynomial functions 
units remained very similar. A unit on permutations was added, and changes were made to the. 
trigonometry, quadratic relations, sequences and series, and statistics units. For more detailed 
information refer to Appendix |. Another change to the 2000 curriculum was that more 
powerful graphing calculators and computers were widely used. 


In an examination of the Mathematics 30 Information Bulletin for 1991 and 2000, some changes 
were seen in terms of performance standards. 


According to the 1990-9] Bulletin, to obtain the acceptable standard, students had to 
demonstrate mathematical skills and knowledge in the six content strands. To attain the 
standard of excellence, students had to demonstrate a thorough knowledge of mathematics and of 
the skills necessary to apply their knowledge to new situations. Examples were given illustrating 
these requirements. In the /990-9/ Bulletin, there were general statements describing the 
expectation for the acceptable standard and the standard of excellence. 


-A0- 


The 1999-2000 Bulletin was much more detailed than was the /990-/991/ Bulletin. Student 
expectations, as related to curriculum content, are listed for the acceptable standard and the 
standard of excellence. Illustrated examples of each standard were provided. In addition to 
statements that contained mathematical procedures and conceptual knowledge, there were many 
statements that used the following verbs and phrases: “explain,” “derive,” “recognize,” “verify,” 
“use graphing calculators or computers to find,” “explain orally and in writing,” “describe the 
effects,” or “describe what happens,” “describe the relationships,” and “complete the solution to 
problems that can be represented by.” This terminology is evidence that the NCTM standards of 
connections, communication, estimation and mental calculation, problem-solving, reasoning, 
technology, and visualization were incorporated into the standards for the 2000 examinations. 


19 66 


In 1991, students who attained the standard of excellence were required to answer most open- 
ended questions at all four cognitive levels (knowledge, comprehension, application, higher 
mental activities). Compared with the type of question asked in 2000, the 1991 written-response 
questions were quite procedural and did not explicitly demand higher mental activity such as 
reasoning deductively or providing justifications and explanations. . 


In 2000, the written-response section focused on students’ understanding of the process of 
solving a problem and encouraged students to take risks to arrive at a solution. To achieve the 
standard of excellence, students had to select a strategy, carry it through, and solve the problem 
correctly. The written-response questions focused on students’ understanding of mathematical 
concepts and allowed students flexibility in demonstrating their communication and problem- 
solving abilities in mathematics. The written-response questions required students to solve, 
explain, justify, or prove their solution. 


Written-Response Questions 

The questions on the 199] examination used the directing words “find,” “determine,” “show,” 
and the term “express b in terms of a.” On the 2000 examination, questions were expressed 
using the verbs and expressions “determine,” “justify,” “write a general expression,” “what 
type,” “how many,” and “explain.” The 1991 examination asked students to perform primary 
algebraic procedures and symbolic manipulations to determine answers. The students writing 
the 2000 examination were required to explain or justify with sentence answers as well as to 


perform algebraic/symbolic manipulation. 


o> 6 


99 66 99 66 


In 1991, the information given on the written-response questions used straightforward algebraic 
and sentence statements about the concepts. For example, written-response | began “The graph 


of ...,”” written-response 2 started with “The first and second terms of a sequence ...,” and 
written-response 3 contained sentences that began with “A circle with ...,” and “A parabola 
with ...”. In 2000, questions were stated in various contexts. Connections were made to the real] 


world as in the first question, which made reference to hamburger toppings. In addition to 
information stated in sentences, there were various visual representations of the data. In written- 
response question |, a diagram of Pascal’s triangle and a chart summarizing the data were given. 
Written-response question 2 contained a graphical representation of an ellipse, and written- 
response question 3 had geometric drawings of the generations of Koch’s “snowflake.” The 
written-response questions on the 1991 examination were similar to the questions found in the 


=e 


course text. Questions on the 2000 examination were stated in contexts that the students may not 
have seen before. 


In summary, the questions in 1991 were multistep procedural questions. For example, in 
written-response questions 2 and 3, a student would need the information from part a and part b 
in order to successfully complete part c, and these could have been posed as multiple-choice 
questions. On the 2000 examinations, parts of a question tested various concepts of the topic 
being evaluated but were somewhat independent of each other (written-response 2). Questions 
on the 2000 examinations were simple lead-in questions that required simple arithmetic 
calculations or substitution into a formula. In written-response questions | and 3, students were 
required to then generalize the pattern that they derived, and finally present a problem-solving 
application of the concept. Whereas, in 1991, the written-response questions could have been 
posed as multiple-choice questions, the questions posed on the written-response part of the 2000 
examination would not have been appropriate in this form. 


Marking 1991 and 2000 Papers 


In 1991, the scoring guide for marking written-response questions had two main categories for 
mark deduction. 


The first category was for arithmetic error. A maximum of one mark per examination was 
deducted for such things as computation error, incorrect numerical substitution, incorrect 
copying of formula, or incorrect rounding error. 


The second category was for conceptual error. Up to two marks were deducted for 

such things as using incorrect formulas for the circle or parabola in written-response question 3, 
determining the common difference incorrectly, or determining the common ratio incorrectly in 
written-response question 2, or indicating P(—1) is 0 or (x + 1) is a factor in written-response 
question |}. 


A maximum of one mark was deducted per examination if answers contained illogical 
mathematical statements. For example, in written-response 1, if a student wrote 
P(-l)=1+5+4+c¢-8=c—2,c=2, or used the addition method and wrote x + | instead of 1, one 
mark was deducted. 


The 1991 scoring guide also listed the following general points. A student could not lose marks 
already earned. For example, if a student had already earned | mark (2 marks, 

3 marks, etc.) and then made an error, he/she would receive at least 1 mark (2 marks, 

3 marks, etc.) for the question. In the case of an arithmetic error, the maximum mark was 
decreased by | and the balance was marked for consistency. In the case of a conceptual error, 
the maximum mark was decreased by 2, and the balance of the question was marked for 
consistency. If a correct numerical answer was given but no solution or justification was shown, 
then only | mark was given for each answer. 


There was no deduction for an incorrect transcription of an answer to an answer line. If 
interpretation was in doubt, the student was always given the benefit of the doubt. 


The scoring guide was followed by six sample solutions for written-response 1, four sample 
solutions for written-response 2, and four sample solutions for written-response 3. 


In 2000, the markers were given possible solutions for each question but a five-point rubric 
accompanied each question to allow for a more holistic approach to marking. The five-point 
rubric for written-response question | follows. 


5 marks: The student demonstrates an understanding of the problem. He/she uses appropriate 
mathematical knowledge and problem-solving techniques to find the solution. He/she 
also justifies and explains the relevance to the problem. 


4 marks: The student demonstrates an understanding of the problem, uses appropriate 
knowledge and problem-solving techniques to find a solution, however, the solution 
contains a minor flaw. The student shows justification for his/her results. 


3 marks: The student demonstrates some understanding of the problem, and uses appropriate 
mathematical knowledge and problem-solving techniques to find partial solutions. 
The student communicates little understanding of the complexities of the problem but 
does formulate some concepts of the problem mathematically. 


2 marks: The student explores initial stages of the problem and applies some relevant 
mathematical knowledge and problem-solving techniques to find partial solutions. 


| mark: The student explores initial stages of the problem and applies some relevant 
mathematical knowledge and problem-solving techniques in working toward a 
solution. 


In scoring the written-response section of the examinations, markers evaluated how well students 
* understood the problem or the mathematical concept 

* correctly used the mathematics 

° used problem-solving strategies and explained their answer and procedures 

* communicated their solutions and mathematical ideas 


In 2000, students were encouraged to try to solve the problem. Even an attempt could be worth 
some marks. Students could still do algebraic manipulation. Students in 1991 selected 
appropriate formulas from a formula sheet. In 2000, students could select formulas but a number 
of them seemed to create their own. 


-43- 


Frequency Distribution of Marks for Sample Examinations From 1991 and 2000 


The following tables represent the frequency distribution of marks that students achieved on the 
three written-response questions for each year. NR means “no response,” for which no marks 
are awarded. In 1991, two questions were worth four marks and one was worth five. In 2000, 
all three questions were worth five marks. The “Average” is the average for the 100 papers 
studied. The bolded average at the bottom is the cumulative average of the three questions. 


Distribution of Marks at the Acceptable Standard, 1991 


(100 Papers Studied) 
| NR : o | 1 7 2 [| 3 | 4 | 5 _ [Average 
WRI | 16 40 | 8 1 | 9 | 26 — | 36.25% 
WR2 | 7 | 25 10 | 39 | 4 10 5 33.80% 
WR 3 : 1 | 6 : 58 | 29 | 5 I — | 33.75% 
| | | | | 34.60% 
Distribution of Marks at the Acceptable Standard, 2000 
(100 Papers Studied) 
NR oO | 1 2 | 3 4 | 5 |Average 
| WR 1 | | 26 21. | 29 21 : I 48.80% 
| WR2 zie 16 | 40 | 30 7 3 45.80% 
| WR3 o | 1 26 54 9 4 | 6 | 41.40% 
| | | | [ 45.30% 


The working group made the following observations. 

¢ There appears to have been more opportunity in 2000 for students to achieve a passing grade 
on the written-response questions. 

¢ In 1991, students who knew something about the question were given zero because the 
marking key was quite rigid. In the first two parts of the question, a mark was awarded for the 
correct answer only, and the questions were quite procedural by nature. 

° In 2000, because of a shift to more holistic marking, students who knew something about the 
question could get at least one mark. 

¢ The number of answers awarded NR or zero was greatly reduced from 199] to 2000. 


44. 


Distribution of Marks at the Standard of Excellence, 1991 


(100 Papers Studied) 
NR 0 a ae” 3 | 4 5 | Average| 
WR | 0 3 2 4 9 : 82 — | 91.25% | 
WR2 0 0 6 10 10 73 | 89.60% 
WR3 | 0 0 24 34 37 ! 5 — | 55.75% 
| | [ 78.87 % 


Distribution of Marks at the Standard of Excellence, 2000 


(100 Papers Studied) 
NR | 0 1 2 3 : 4 5 [Average 
WR | 0 | Oo 0 9 | 19 37 35 | 79.60% 
WR 2 o | O 0 9 26 22 43 | 79.80% 
WR 3 0 0 i 6 14 18 61 86.40% 
| 81.93% 


The working group made the following observations. 


¢ Averages were quite high for these, except for written-response 3 in 199] in which many 
students had the wrong equation or did not check for non-permissible values. 
¢ In 2000, students achieving the standard of excellence generally received marks of 


3, 4, or 5 for questions that required justification or explanation; whereas, in 1991, students 


mainly received marks of 4 or 5, except for written-response question 3, where students 


received marks from | to 3. The average on written-response question 3 was also quite low as 


compared with the other two questions. 


¢ In 2000, students achieving the standard of excellence achieved marks of 3, 4, or 5 for 


questions that required justification or explanation; whereas, in 1991, there was not as great a 
spread in the marks with most students achieving a perfect score except for written-response 


question 3. 


¢ Students achieving the standard of excellence in 1991 and 2000 performed algebraic 


procedures systematically and clearly. 


45. 


Conclusions 


In What Ways Have the Expectations Changed for Mathematics 30 Students from 1991 to 
2000? 


The expectations for Mathematics 30 students have definitely changed from 199] to 2000. In 
199], students were taught mathematics to solve problems but the problem-solving involved 
step-by-step solutions. Problem-solving was not integrated into the mathematics curriculum. 


In 2000, Communication and Connections emphasized problem-solving as a process. There is 
also a definite emphasis on using technology. 


Further, the way in which questions are phrased has changed. In addition to reading, students 
must now be able to interpret. 


The way in which examinations are marked has also changed. Now, there is a range in the top 
mark of 5; whereas, in 1991, a 5 meant that the answer was perfect or flawless. Previously, a 
student could get 100% by memorizing procedures but did not necessarily have to have complete 
understanding. In addition to accuracy, a student now must show understanding by answering 
questions that require showing and justifying and explaining. 


In 1991, written-response questions comprised 20% of the examination; whereas, in 2000, they 
comprise 30%. In terms of written-responses, students are now expected to communicate and 
demonstrate more of their mathematical knowledge than before. 


In 1991, the written-response questions were mainly free of context. Now, they are more likely 
than not set in a specific context. In 2000, students at the standard of excellence are expected, 

and are able to, interpret context. They are able to create their own formulas, generalize from a 
pattern, and explain their answers. In the view of the marking group, the level of mathematical 


knowledge, reasoning, and problem-solving required to successfully complete the questions has 
increased. 


In 2000, the written-response questions have more multiple representations and connections. 
Students are expected to visualize and interpret the characteristics and changes that occur to a 
graph and its function. This is related to more use of graphing technology. 


How Have the Standards Changed for Mathematics 30 Students from 1991 to 2000? 


The standards stated in the 1990-91] Mathematics 30 Information Bulletin were general 
statements. In the 2000 Mathematics 30 Information Bulletin, the acceptable and excellent 
standards were explicitly stated through the listing of curriculum concepts and the incorporation 
of the NCTM processes. A list of directing words was also included. The specific statements of 
standards were written primarily to inform Mathematics 30 teachers about the extent to which 
students must know the Mathematics 30 content and be able to demonstrate the required skills. 


-46- 


In 1991, communication was evaluated in relation to the logic and accuracy in a student’s 
mathematical statements and symbolic manipulation. In addition to these skills, the 2000 
examinations required students to communicate concepts and results by justifying and explaining 
conclusions and mathematical concepts. 


These clearly defined changes with respect to content and processes, and the holistic marking 
rubric appears to have allowed a student in 2000 more opportunity to achieve the acceptable 
standard since students were rewarded for the mathematics they knew. They did not have to 
have a specific statement or correct answer in order to receive a mark. Results seem to indicate 
that these changes differentiated the students at the standard of excellence (from 3, 4, or 5, out of 
5) in 2000. 


In 1991, to receive 4 out of 4, or 5 out of 5 on a written-response question, a student had to give 
a perfect response with no flaws or errors. In 2000, a response could have a minor flaw and still 
be awarded a mark of 5 out of 5. A response that was not perfect might still be considered 
excellent. 


How Has the Marking of the Written-Response for Mathematics 30 Changed? 


In 1991, markers tended to focus on the correct answer. They did not seem to be looking for 
student understanding as much as for the answer. Algebraic solutions were stressed in problem 
solving. At the standard of excellence solutions required very explicit and well-formed 
responses. 


In 2000, students got marks for responses that were less formal or less explicit. They tended to 
attempt to annotate explanations in their solutions. There were, however, many very explicit and 
well-formed solutions in 2000, as evidenced by the papers viewed. 


In 2000, algebraic solutions were still important but with more emphasis on geometric 
representations, possibly as a result of the increased use of graphing calculators. Students are 
now expected to communicate their mathematical knowledge in their responses. As well, 
students are expected to show more reasoning. The working group’s consensus was summed up 
in the statement, “There is more cerebral activity.” In 1991, markers looked for mistakes and 
deducted marks; whereas in 2000, students were rewarded for knowledge. 


Special Acknowledgements 

Special thanks go to Cynthia Ballheim, Paul Downes, Perry Kulmatyski, Indy Langu, Elaine 
Simmt, and Marge Marika for their valuable contributions to this paper. Special thanks go to 
Rebecca Kallal for editing and Rebecca Meyers for typing. 

Sincerely, Ross Marian 


AW 


Appendix 1 


Achievement-Over-Time 


for Mathematics 30 Written-Response Question 


(June 1991 and June 2000 Comparison) 


The Curriculum Content remained the same, except for the following differences. 


1991 2000 
Trigonometry |° sine and cosine law ¢ discuss parameters in terms of 
* coordinates and paths on unit transformations when 
circle comparing two graphs 
* area of regular polygon ¢ given graph, find function and 
* parameters used to describe zeros 


graph 
given function, find zeros 


Quadratic * standard form—relationship * locus definitions, eccentricity 
Relations between parameters a, b, c * general form—relation of 
* extensive terminology such as parameters to graph of relations 
major and minor axis, conjugate | ° introduction of graphing 
and transverse axes, asymptotes | techniques, visualization of 
* application questions conics defined by locus 
* algorithm to solve definition 
* graphing and transformations 
¢ intersection of plane and conical 
surface 
Statistics * standard deviation 


probability (experimental, 
theoretical frequency 
distribution) 


-48- 


Sequences and |* limits of various function, and | recursive definition 
Series infinite convergent sequences 
sum of infinite convergent 
geometric series 

convergent and divergent 
sequences 

annuities 


Polynomial use of graphing calculator to 
Functions i graph 


Exponents and use of technology to graph 


Logarithms 


Permutations |* not atopic in 1991 curriculum all of the present content 
and 


Combinations 


Note: In 1991, the above topics represented the core component (80% of the curriculum). 
There was an elective component (20% of the Mathematics 30 course) which could be 
one or more of the following topics: Arrangements and Selections, History of 
Mathematics, Mathematical Induction, Matrices, Probability, Topology, and Vectors. 


.40- 


Written-Response Questions 


1991 2000 


Scoring Guide 


Total Mark/ 
Question 
Emphasis 


% of Final 
Mark 


Format 


¢ Analytic: a mark awarded for a |* Holistic: look at entire question 


required statement or correct to determine mark 
answer 

¢ 4or 5 marks/question ¢ 5 marks/question 

* 20% ° 30% 

* primarily algebraic and ¢ Jead-in question (usually 
symbolic manipulation arithmetic calculation) 


* not all parts of question 
dependent on each other 

* could require an explanation or 
justification in sentence answer 


-50- 


Appendix 2 


Achievement-Over-Time 
for Mathematics 30 Written-Response Question 
(June 1991 and June 2000 Comparison) 


Examination Specifications and Design 


Core Content: Percentage Emphasis 1991 2000 
Trigonometry 2 12 
Quadratic Relations 22 10 


Sequences and Series 
Statistics 
Logarithms and Exponents 


Polynomial Functions 


Permutations and Combinations 


10 
* This is the percentage of the total test related to 70* 
machine-scored portion (70%) of the test. entire test. | machine- 
Wrilten-response questions may fall into many scored 


categories, 


part of test 


1991 | 2000 
Number of | Percent Number of Percent 
Question Format Questions Emphasis Questions Emphasis 
Multiple-Choice 40 61.5 40 57 
Numerical-Response ) 12 | 18.5 9 13 
Written-Response 3 20.0 3 30 


-5|- 


Cognitive Levels / Mathematical Understandings 


1991 2000 
T 
Percent (Mathematical Percent 
(Cognitive Levels) Emphasis Understanding) Emphasis 
Multiple-Choice Multiple-Choice 
Numerical-Response Numerical-Response 
Written-Response 
° Knowledge 8 ¢ Procedure 34 
* Comprehension 26 ° Concepts 30 
¢ Application a2 ¢ Problem-solving 33 
¢ Higher Mental Activities 14 
Written-Response 
* Comprehension, 20 


Application, Mental 
Activities 


Appendix 2-2 


COMPARISON OF STUDENTS’ WRITING 


English 30 Diploma Examinations 
January and June 1991 and June 2000 


Alberta 


LEARNING 


ACHIEVEMENT OVER TIME 
Abstract 


The 2000 Achievement-Over-Time Study compares the writing of English 30 students in June 
199] and June 2000 at two standards: Satisfactory (a score of 3 out of 5) and Excellent (5 out 
of 5). The study includes comparisons of assignments, scoring criteria, example papers, and 
rationales; a qualitative study that describes the characteristics of student writing in 1991 and 
2000; and a quantitative study. In the quantitative study, 100 English 30 essays written in 1991] 
were scored according to standards used in June 2000. The results of the comparisons and 
qualitative study supported a conclusion that was confirmed by the quantitative study. 


The comparisons indicate that in June 1991, essays in which detailed knowledge was presented 
in a traditional format, in language that conformed to the conventions of the time, were 
rewarded; whereas, in June 2000, essays that demonstrated independent thinking and control of 
matters of correctness, presented through original approaches, were rewarded. As a result, the 
standards for student writing in 2000 represent a demonstrable increase over the 1991 standards, 
and the quality of student writing has risen accordingly. Although the Satisfactory essays 
demonstrated relatively little change from 1991 to 2000, students’ at the Excellent level 
demonstrated more variety in ideas, organization, and style in their writing. Students at both 
levels generally made an effort to edit their writing. The statistical results reinforce the 
conclusion that students responded to higher standards by writing more effectively. 


Introduction 


Provincial examinations were reintroduced in 1984. Since 1984, marks from diploma 
examinations account for 50% of a student’s final mark. The remaining 50% is established by 
the school-awarded mark. The English 30 examinations are comprised of two sections so that 
reading and writing skills are assessed separately. The writing component is valued at 50% of 
the examination mark. 


Periodic achievement-over-time studies of student writing in English 30 have been based on the 
major writing assignment in the written-response section of the examination. Each paper 
receives two independent readings by trained teacher-markers who score the essay according to 
criteria that describe features of the writing in five-point scales. 


Comparisons are made at two standards: Satisfactory (3 out of 5), which represents writing at a 
level acceptable for students seeking graduation, and Excellent (5 out of 5), which represents 
outstanding writing. The 1990 Achievement-Over-Time Study compared 1984 and 1990 English ~ 
30 essays and found that the expectations reflected in the scoring criteria had increased, and that 
improvements, especially at the Satisfactory level, had been dramatic. Unlike the 1990 study, in 
which Social Studies 30 and English 33 were also considered, the current study was concerned 
only with writing in English 30. 


Procedure 


Since January and June diploma examinations are made available to students for practice, it is 
not possible to check student achievement by readministering previous examinations. In 
addition, writing tasks change with each test and criteria gradually change. Comparisons must 
establish whether apparent changes reflect actual changes in student performance or differences 
in the examinations, standards, or marking procedures. For example, in 1991, each paper 
received three independent readings by markers; if there was significant disagreement in scores, 
the paper received a fourth reading. In 2000, each paper received two readings and, if there was 
significant disagreement, a third reading. Observation of student writing on the shorter of two 
assignments revealed shift from emphasis on personal application of ideas to emphasis on 
supported analysis of ideas. Such change allows for continuous improvement of the examination 
and scoring, but makes direct comparison over time challenging. 


In 199], students read a poem as a basis for writing a “Minor Assignment: Personal Response to 
Literature,” which required them to relate a quotation from the poem to their “own experience.” 
Students then wrote a longer “Major Assignment: Literature Composition. ” In 2000, students 
read a poem and wrote a “Reader’s Response to Literature Assignment,” which required them to 
use “specific detail from the poem” to support a conclusion that they drew from the poem. Then, 
students wrote a longer “Literature Composition Assignment.” While the shorter assignments in 
both years served a similar function—stimulating thinking about the topic of the longer 
assignment—the differences in approach made the shorter assignment unsuitable for use in the 
study. The shorter assignment is valued at 15% of the total examination mark, whereas the 
longer assignment is valued at 35%. For these reasons, the Achievement-Over-Time Study 
examined only writing in the longer assignment that is now called the “Literature Composition 
Assignment.” 


-55- 


Study Questions 
The current Achievement-Over-Time Study sought to answer the following questions. 


I, What score would English 30 papers written in 1991 receive if they were 
scored in accord with the standards used in 2000? 


2. Have standards for students’ writing in English 30 changed between 1991 
and 2000? If so, how? 


Reviewers 


The Achievement-Over-Time study group was composed of seven educators with experience 
assessing student achievement. 


Sandra Erickson (M.E. LaZerte High School) 
Margaret Hadley (University of Calgary) 

Carol Mayne (St. Joseph Composite High School) 
Lyle Meeres (retired) 

Harvey Stables (Irma School) 

Dwayne Stewart (retired) 

Jean Watt (Jasper Place High School) 


The study group was assisted by staff from the Learner Assessment Branch of 
Alberta Learning. 


Elana Scraba (Assistant Director) 
Stephen Mitchell (Analytic Services) 
Mary Lou Campbell (Humanities’ Specialist) 


Process of Study 


To answer the study questions, the reviewers first compared the assignments, scoring criteria, 
example papers, and rationales from the June 1991 and June 2000 examinations. Second, the 
reviewers undertook a qualitative review in which they described the characteristics of student 
writing in 1991 and 2000 to discern changes. Finally, a quantitative study was conducted: 1991 
writing was compared with 2000 writing by scoring a sample of 1991 papers with the 2000 
criteria. 


-56- 


Comparison Study 


Comparison of Assignments 

In both 1991 and 2000, students first read a poem, wrote a short assignment to initiate thinking 
about a topic relevant to the poem, and then wrote the longer assignment that is the subject of 
this study. Each longer assignment had a preamble, a question framed in a box, and a set of 
“Guidelines for Writing.” (See Appendix A.) 


A comparison of the assignments reveal that the way in which the actual examination question is 
presented has not changed significantly, but that the introduction to the topic has changed in 
important ways. The reviewers deemed the 1991 poem more challenging than the poem used in 
2000, but this difference was counterbalanced by the 1991 preamble, which repeated key words 
and suggested a thesis that students often developed in their essays. The 2000 preamble is a 
single short statement that identifies the writing topic without suggesting a possible essay thesis. 
Reviewers felt that the 1991 topic, “imagination,” was challenging, but that students received 
more assistance: not only did the 1991 preamble provide students with a possible thesis, but as 
well, the guidelines for the 1991 assignment listed literary elements such as setting, irony, 
contrast, conflict, imagery, and symbol as reminders for students. Students often used these 
literary elements to organize their essays. However, the study group noted this supporting 
information, which may have helped average students, may also have limited exceptional 
students’ development of ideas. 


The 2000 assignment encouraged students to explore their own ideas about the topic 
“perseverance.” Reviewers thought that the relative lack of supporting information was 
challenging for average students and liberating for exceptional students. The study group 
thought that the 1991 assignment implied that there was one right idea and one way to organize 
the essay; whereas, the 2000 assignment implied the validity of different ideas about the topic 
and different ways to develop those ideas. 


Comparison of Scoring Criteria 

Markers score essays on five-point scales that range from | (Poor) to 5 (Excellent). An 
insufficient response or no response is scored as 0. In this study, only essays scored as: 
Satisfactory (3 out of 5) and Excellent (5 out of 5) were compared. (See Appendix B for the 
scoring categories and criteria used in 1991 and 2000.) 


The criteria by which students’ essays are assessed reflect the curriculum. In 1991, essays were 
assessed in five categories: Total Impression, Thought and Detail, Organization, Matters of 
Choice, and Matters of Convention. In 2000, four categories were used: The Category “Total 
Impression” had been eliminated because it was felt that its criteria did not discriminate 
effectively. As well, “Matters of Convention” had been renamed “Matters of Correctness” to 
increase emphasis on the importance of editing. 


Some changes made to the scoring criteria between 199! and 2000 encouraged students to 
develop their own ideas. In 1991, the criteria for assessing Thought and Detail refer to “carefully 
chosen details”; whereas, in 2000, the criteria had been broadened to “carefully chosen 
evidence.” This change allowed for a wider range of support to be used. In 1991, the descriptors 


for Satisfactory (3) writing required “appropriate details”; in 2000, a more demanding descriptor 
required “purposefully chosen evidence.” Other changes to the 2000 criteria clarified 
descriptors. The study group felt that the general effect of more specific descriptors was that the 
2000 criteria were more demanding. 


Changes to the descriptors of the Organization category further dispelled the idea that there is 
one formula for producing the best structure for an essay. The 2000 criteria refer to 
“organizational choices.” This use of the plural suggests that there is flexibility in the way that 
an essay can be focused and shaped. However, students writing Satisfactory (3) essays in 199] 
and 2000 generally structured their essays similarly. Students writing the 2000 Excellent (5) 
essays used a wider variety of organizational approaches to the assignment since the revised 
criteria allowed them to devise original approaches such as arriving at a thesis by inductive 
means as the essay concludes. 


In the category Matters of Choice, criteria assess students’ style of expression, including diction 
and syntax. Whereas the 199] criteria referred to “tone” and “purpose,” the 2000 criteria 
referred to “effectiveness” and “voice.” The term “voice” suggests that the style is distinctive 
enough to characterize the individual behind the words. In 2000, more of the Excellent (5) 
essays reflected an effort to take more risks than in 1991. This change may be the result of 
teachers emphasizing the criteria with their students. Descriptors in the Matters of Choice and 
Matters of Correctness categories were revised following the 1993 “Conventions of Language 
Study.” 


In 1991 and 2000, markers were directed to consider the complexity and length of essays when 
considering correctness. An examination of the descriptors for Matters of Correctness revealed 
that criteria used in 2000 were more specific than in 1991. In 1991, markers were directed to 
consider correctness of mechanics (spelling, punctuation, capitalization) and grammar. The 
criteria used in 2000 included consideration of sentence construction (completeness, consistency, 
subordination, coordination, predication) and usage (accurate use of words according to 
convention and meaning). In 2000, the focus on the degree to which students controlled 
correctness was greater than it had been in 1991. Thus, the description of the Excellent (5) 
essay includes reference to “confidence in control,” the description of the Satisfactory (3) essay 
to “control of the basics,” and the description of the Poor essay to a “lack of control.” In 2000, 
the criteria emphasized the clarity of communication; whereas, in 1991, the criteria stressed the 
frequency of errors. The study group felt that this increased specificity and focus on control and 
clarity indicate an increase in what is expected in responses to the assignment. More is expected 
from student writing in 2000. 


In 1991, students used memorized literary detail as a substitute for their own considered ideas; 
whereas, in 2000, more students are attempting to articulate their own ideas without using as 
much quoted material. The comparison of Satisfactory (3) and Excellent (5) papers suggests that 
the 199! assignment on “imagination” encouraged detailed responses covering the literature; 
whereas, in 2000, the focus was on ideas about “perseverance.” 


-58- 


The changes in criteria between 1991 and 2000 had a significant, positive influence on student 
writing. The increased expectations represent a change in standard that appears to have had an 
impact on teaching and learning, as evidenced by students’ writing, particularly at the 
Excellent (5) level of writing. Excellent (5) essays in 2000 reflect emphasis on expressing 
original thinking and presenting it through a variety of approaches to organization and ina 
carefully chosen style. If 1991 essays resembled “paint by numbers,” essays in 2000 reflected 
individuals painting their own pictures. 


Example Papers and Rationales 

In 1991 and 2000, markers were given sample papers representing typical essays at each of the 
five levels of achievement, and a rationale explaining why these essays represent typical writing 
at that level. The study group compared the example papers and rationales to determine 
differences in standards between the two years. 


The comparison revealed that the 1991 and 2000 example essays demonstrated comparable 
writing. However, students writing the 1991 essays tended to be more thorough when dealing 
with details from literature, which reflects the requirement in the criteria for “carefully chosen 
details.” The study group noted, though, that the 1991 essays were more reliant on plot details, 
sometimes with unexplained relevance, and in general, the organization of details was more 
mechanical and predictable. In 2000, some students became so involved in examining broad 
ideas from literature that they provided less detail, although the 2000 essays did contain a wider 
range of evidence than did those of 1991. The 2000 essays were somewhat more likely to refer 
to character development and themes found in the literature, which reflected the shift in 
standards. (See Appendix C.) In both 1991 and 2000, some students successfully balanced ideas 
and evidence. The changes in criteria and guidelines used in 2000 assisted students writing at 
the Excellent (5) level, but not students writing at the Satisfactory (3) level since these students 
depend more upon structure and formulaic approaches to writing. 


Qualitative Review 


Process 

The qualitative review was a descriptive study that examined the characteristics of student 
writing in English 30 in 1991 and 2000 to discern which changes, if any, occurred. Comparisons 
were made from sample papers at two standards, Satisfactory (3) and Excellent (5). Reviewers 
described sample papers in three general categories designed to reflect curriculum expectations: 
Thoughtfulness, Effectiveness, and Correctness. The study group’s focus was on the 
characteristics of the writing, not on grading the papers, since only papers that had received 
either Satisfactory (3) or Excellent (5) throughout were considered. In both the 2000 essays and 
the 199] essays, the sample consisted of essays that had received Satisfactory (3) or Excellent (5) 
in all categories. - 


-59- 


Results of the Review of June 2000 Essays 

The June 2000 Excellent (5) essays revealed a broad range of content and a variety of 
approaches. The sample included many essays with profound controlling ideas. At this level, 
students generally dealt with the universality of the ideas they discovered in the literature. In 
June 2000, students organized and developed their ideas, sometimes creating analogies to clarify 
points. Their organization of complex ideas was often impressive. Essays that were assessed as 
Excellent (5) in 2000 reveal that students are developing impressive capabilities with language. 
Sentence structures in the Excellent (5) essays created interest through variety and emphasized 
important ideas. Some students used idiomatic expressions which, although understandable in 
today’s informal world, indicates lack of knowledge of appropriate levels of formality. Many 
students were able to sustain a coherent and logical development throughout their essays, some 
of which were quite long. Many students effectively embedded quotations in their writing. Most 
Excellent (5) essays were very original in both ideas and in the use of language to create effects. 
The variety of ideas and approaches in the 2000 Excellent (5) essays suggests that students have 
developed their own ideas using a distinct voice. 


Generally, the essays followed a chronological approach and students limited discussion to three 
examples. Introductions and conclusions tended to describe issues in absolute terms or to offer 
blunt, oversimplified, moral dictates. However, the Satisfactory (3) essays presented logical 
arguments through appropriate examples. Students’ language choice and use was clear. 
Expressions as well as ideas were sometimes clichéd. Most writing flowed, but transitions were 
frequently mechanical. Many essays were lengthy, and details revealed a sincere attempt to 
present an effective argument. Most students made a clear effort to edit their essay. Generally, 
students appeared to select simple diction and syntax, presumably to maintain basic control of 
the writing. 


Results of the Review of June 1991 Essays 

The Excellent (5) papers of 1991 were scholarly, clinical, and objectively distant rich in detail; 
and, based clearly on literature that had been studied and analyzed. At times, detail seemed to be 
regarded as so important that it was presented at the cost of ideas. For example, discussion of the 
universality of the literary idea was occasionally secondary to proving that the idea appeared in 
the literature. The organization of these essays was sound rather than striking, possibly because 
the papers were carefully structured in mechanical, traditional patterns. Diction was carefully 
chosen but less effective than was diction in the 2000 Excellent (5) papers. The study group felt 
that students were less passionate about the ideas and less involved in writing than their 2000 
counterparts. Nonetheless, these papers were generally sophisticated and demonstrated strong 
ability in terms of literary analysis. 


Students who received Satisfactory (3) in 1991 presented considerable detail, though it was 
sometimes scattered, sometimes redundant, and inconsistently purposeful. There was much 
retelling of the story. Although most of the detail supported the generalizations, it sometimes 
blurred the idea. Students sometimes failed to relate literary characters to real life people and 
drew conclusions that were too specific to the literature, thereby limiting the universality of the 
students’ idea. Yet other students failed to draw a conclusion from the detail, leaving it to the 
reader to “interpret” the content, others presented a clear conclusion that appeared to be a forced 
return to the topic. Arguments tended to appear as separate, so there was little sense of unity of 


-60- 


the whole. Order was sometimes imposed by switching from one literary example to another or 
by changing from one literary element to another. Choices appeared to be made to keep content 
and language simple in order to avoid errors in correctness. 


Conclusion 

The study group concluded that more demanding criteria in 2000 led to more effective writing, a 
consequence especially visible in the Excellent (5) essays. The changes in criteria increased the 
standard in 2000. The 2000 Excellent (5) essays show that the quality of student writing has also 
increased. 


Quantitative Review 


Process 

Essentially, the process for the quantitative review consisted of scoring 1991 papers using 2000 
standards to see if standards and achievement had changed. Prior to scoring the 1991 papers 
with the 2000 criteria, reviewers carefully studied the 2000 criteria under the guidance of readers 
who had just completed the summer marking session. Reviewers were then given a randomized 
set of sample 1991 papers that had received Satisfactory (3) and Excellent (5) scores. Reviewers 
scored these papers according to the 2000 writing. The scores assigned were then compared with 
the scores assigned in 199]. 


Results 

The resulting data show that at the Satisfactory (3) level, there was little difference between 1991 
and 2000. The 1991 essays were slightly lower. However, the data show that 1991 papers that 
had received an Excellent (5), when scored by the 2000 standards, would more likely have 
received a Proficient (4) or a score between Proficient (4) and Excellent (5). This indicates that 
the standard at Excellent (5) has increased during the nine-year period. Although the Excellent 
(5) 1991 essays were still considered very strong according to 2000 standards, they would have 
received somewhat lower scores in 2000. (See Appendix D.) Scores on the Excellent (5) papers 
changed most in the categories of Thought and Detail, and Organization. Scores on both the 
Satisfactory (3) and Excellent (5) papers changed least in the categories of Matters of 
Correctness. The results from scoring the 1991 papers with the 2000 criteria coincide with the 
results from comparisons made in the review of criteria and the results in the qualitative review: 
more is now expected of students, and students now produce more effective writing. 


Summary of Findings 


Assignments and criteria have changed since 1991, and achievement has changed accordingly. 
Although there is a demonstrable increase in achievement at the Excellent (5) level and a 
demonstrable increase in the quality of student work at Excellent (5), the most dramatic 
differences in student work between 1991 and 2000 were in the ways in which students 
approached their essay writing. An Excellent (5) essay is still an Excellent (5) essay, but 
Excellent (5) essays in 2000 were more likely to present the ideas of the individual writer rather 
than to reflect the ideas of others. Excellent (5) essays in 2000 were more varied in organization, 


-6l- 


as writers followed thoughts to a conclusion that sometimes evolved through inductive 
reasoning. In 2000, many of the best writers were clearly involved in the task; whereas, the 1991 
writers were more likely to present quantities of detail in a traditional organization. In 2000, 
students used a wider range of literary examples as they sought support for their ideas. The 
writing skills in the Excellent (5) essays in both years reflected confidence and competence. 


Essays at the Satisfactory (3) level showed no demonstrable change in achievement: a 
Satisfactory (3) essay in 1991 was a Satisfactory (3) essay in 2000; however, there were some 
changes in the characteristics of the writing as students in 2000 strove to achieve a more 
independent expression of ideas. Students achieving at the Excellent (5) and Satisfactory (3) 
levels are conscientiously trying to write effectively. Students writing Excellent (5) essays are 
more personally involved in the challenge, and their independent thinking results in papers that 
are interesting to read. Eventually, students need to develop an awareness of levels of formality 
in writing so that they produce essays that are consciously crafted to suit their purpose and 
audience. As the Satisfactory (3) essays reveal, the movement toward independent thinking is 
incomplete, although a continuing trend toward autonomy and student ownership of writing was 
observed. 


The various facets of the Achievement-Over-Time study comparing 1991 student essays and 
2000 student essays support the same conclusion: standards of writing in English 30 have risen 
and achievement in writing has risen accordingly. 


APPENDIX A 
Written-Response Assignments 
1991 and 2000 


APPENDIX B 
Scoring Categories and Criteria 
1991 and 2000 


APPENDIX C 
Examples from Student Writing 
1991 and 2000 Satisfactory (3) and Excellent (5) Essays 


1991 Examples of “appropriate details” in Satisfactory (3) Papers 
* “Nothing his Grandfather said bothered him” 

* “Every time he came back he had a different job” 

* “Vanessa believed everything that Chris told her” 


1991 Examples of “carefully chosen details” in Excellent (5) Papers 

¢ She follows a delightful schedule, which includes going to the Jardins Publiques to 
enjoy the band” 

* “She has decided to wear her fur—-a wonderful little rogue which rests on her shoulders, 
biting its tail” 

¢ “She sits near a fine old man in a velvet coat and a woman with a roll of knitting on her 
embroidered apron” 


2001 Examples of “purposefully chosen evidence” in Satisfactory (3) Papers 

* “He has just started working with his father, chopping trees all day in the bitter cold 
north, all the while his father pressuring him to be a ‘man’” 

* “He is forced to borrow money from his neighbor in order to survive. He has to deal 
with the fact that he failed at his career, and his dream” 


2000 Examples of ‘‘carefully chosen evidence” in Excellent (5) Papers 

* “The closer the oppressed people come to desolation and death, the harder they push, 
showing great resourcefulness and steadfastness” 

* “Haggars high expectations eventually cause John to rebel against her ideals and model 
himself after his father, Bram” 

¢ “Her relationship with Marvin consisted of one long endless struggle, instead of the 
loving close relationship that was possible” 


-63- 


APPENDIX D 
Table 1 
Recurring Features of 1991 and 2000 Satisfactory (3) and Excellent (5) Papers 


Recurring Features of 1991 
Satisfactory (3) Papers 


Thoughtfulness (Ideas/Content) 


generally clear controlling idea 
inconsistent direction as focus falters 


superficial, broad thesis SsKetenily 
developed 

conventional, basic ideas supported by 
plot details 

appropriate and relevant but simple 
examples 

manipulated literary details or superficial 
connection to the topic 

literal understanding limits ability to see 
applications or universality 

thesis and development limit discussion 


appropriate literature chosen 
issue seen in absolute terms 
writing is task-oriented 


Recurring Features of 2000 
Satisfactory (3) Papers 


ut ee as (Ideas/Content) 


e 


-64- 


* generally clear controlling idea 

focused on topic but only adequate 
development 

superficial or simplistic grasp of applied 
literary meaning 

conventional, basic ideas supported by 
plot details 

appropriate and relevant but generalized 
examples 

manipulated literary details or superficial 
connection to the topic 

failure to grasp author’s intent despite 
fundamental grasp of literature 
complex development attempted but not 
sustained 

appropriate literature chosen 

issue seen in absolute terms 

writing is task-oriented 


Effectiveness 

* language is basic, straightforward, and 
risk-free 

* informal language appears 


° functional, methodical analysis in a 
predictable, chronological order 

* controlled development, reliance on plot 

* transitions, if present, tend to be 
mechanical or ineffective 

* coherence falters when close reading 
distracts focus 

* order is controlled through a simple 
pattern of development 

* functional but abrupt conclusion 


* diction is general or includes a mix of 
precise and inaccurate usage 

* quotations used to impress rather than 
support ideas 

* complex structures are awkwardly 
constructed 

* detached, uninvolved, and lacking voice 


Correctness 

Content 

* generally accurate understanding of 
literature, sometimes with minor 
inaccuracies 

* details tend to be accurate but simple and 
plot-based 


Conventions 

* clear, generally controlled language 

* few risks taken 

* occasional errors in spelling, tense, 
punctuation, and usage 

* some comma splices and some 
incomplete sentence constructions 

* some repetitive diction 

* some attempts to edit writing 


Effectiveness 

° language is basic, straightforward, and 
risk-free 

¢ informal language appears; readers 
spoken to in second point of view 

¢ functional, methodical analysis in a 
predictable, chronological order 

* controlled development, reliance on plot 

° transitions, if present, tend to be 
mechanical or ineffective 

* coherence falters when complexity of 
literature increases 

¢ order is controlled through a simple 
pattern of development 

* simple return to controlling idea in 
conclusion 

* diction is general or includes a mix of 
precise and inaccurate usage 

* tone reflects respect for literature 


* complex structures are awkwardly 
constructed 

¢ may demonstrate involvement but 
develop ineffective voice 


Correctness 

Content 

* generally accurate understanding of 
literature, sometimes with minor 
inaccuracies 

e details tend to be accurate but described 
generally 


Conventions 

¢ clear, generally controlled Janguage 

° some risks taken 

* occasional errors in spelling, agreement, 
punctuation, and usage 

* some comma splices and some awkward 
sentence constructions 

° expression may be wordy 

* frequent attempts to edit writing 


-65- 


Recurring Features of 1991 
Excellent (5) Papers 


Thoughtfulness (Ideas/Content) 


perceptive insights arise from analysis of 
literature 

relatively safe theses supported by 
purposeful choice of details 

abundance of precise supporting detail, 
often including direct quotations 


clear evidence of knowledge of literature 
content is appropriate but extensive 
details make development ponderous 
complexity in literary ideas recognized 


engagement with literature is objective 
formal and scholarly development 
though sometimes vividly detailed 
compares themes and characters while 
generally maintaining a precise focus on 
the controlling idea 

confidence demonstrated through 
convincing support 


purpose is to inform an audience 


prefers to remain within the text 


Effectiveness 


concise, clear, fluent writing 


generally coherent or logical, tightly 
focussed discussion 
some inflated rhetorical flourishes 


structured around skillful comparison, 
chronology of events, character study, 
literary elements, or themes 

some sound, functional organization 


Recurring Features of 2000 
Excellent (5) Papers 


Thoughtfulness (Ideas/Content) 


perceptive insights arise from 
internalized appreciation of literature 
alternatives explored and ideas 
considered from several angles 
carefully selected evidence supports 
ideas that are explicitly and implicitly 
developed 

clear evidence of knowledge of literature 
exploration of levels of thought engage 
readers 

complexity in literary ideas recognized 
and appreciated 

engagement with literature is personal 
original or personalized approach shows 
willingness to take risks to develop ideas 
compares themes and characters while 
generally maintaining a precise focus on 
the controlling idea 

confidence demonstrated through 
willingness to develop unconventional 
arguments 

purpose is to develop independent 
thought 

author’s purpose and craft appreciated 


Effectiveness 


-66- 


clear, occasionally polished, fluent 
writing 

generally coherent or logical, tightly 
focussed discussion 

some inflated rhetorical flourishes and 
some informal language 

structured around skillful comparison, 
chronology of events, character study, 
literary elements, or themes 

some conventional organizations but 
uncommon approaches appear 


Effectiveness 


some uncertain connections between the 
controlling idea and details 

some introductions pose thoughtful 
question 


objective tone 


lack of risk taking 

effective, precise diction 

effective variety of sentence structures 
effective conclusions 

facility in the use of quotations 

simple transitions 

lack of enthusiastic involvement 
ownership appears as knowledge of the 
literature 


Correctness 
Content 


sophisticated, thorough knowledge of 
literature 

accuracy appears to be a major goal 
comprehensive understanding 


valid analytical interpretations 


Conventions 


excellent control of correctness of 
language 

infrequent minor errors do not affect 
clarity of meaning 

the correct use of language is impressive 
considering the complexity and length of 
the essay, and the circumstances 

essays are often essentially error-free 
shows evidence of editing 


Effectiveness 


some uncertain connections between the 
controlling idea and details 

some interesting introductions use 
analogy or prepare way for extended 
metaphor 

voice present through rhetorical 
structures, tone, or wit 

confidently unconventional 

effective, precise diction 

effective variety of sentence structures 
effective conclusions 

facility in the use of quotations 
transitions vary from simple to polished 
passionately engaged in task 

ownership appears in both ideas and 
language 


Correctness 
Content 


knowledge and internalized 
understanding of literature 

relative absence of error 

deceptively simple compared with level 
of thinking 

correctly preserves author’s intent 


Conventions 


-67- 


excellent control of correctness of 
language 

infrequent minor errors do not affect 
clarity of meaning 

the correct use of language is impressive 
considering the complexity and length of 
the essay, and the circumstances 

essays are often essentially error-free 
shows evidence of editing 


APPENDIX E 
Table 2 
Key Features of 1990 and 1984 Satisfactory (3) and Excellent (5) Papers 


APPENDIX F 
Table 3 
1991 Papers Scored by 2000 Criteria 


Appendix 2-3 


CONVENTIONS OF 
LANGUAGE STUDY 


English 30 Diploma Examinations 
June 2000 


Alberta 


LEARNING 


-69- 


The Study 

The study done in the summer of 2000 compared errors in essays written by students for the June 
2000 English 30 Diploma Examination, with information gleaned from a similar, but not 
identical study that was done in 1993. 


Those unfamiliar with the English 30 Diploma Examination need to know that the assignment in 
question requires students to write a literary essay discussing a given theme relating to literature 
that the students select. Teacher-markers score the essays in four reporting categories: Thought 

and Detail, Organization, Matters of Choice (style), and Matters of Correctness (correctness of 

sentence construction, usage, grammar, and mechanics). 


For purposes of this study, the only reporting category considered was Matters of Correctness. 
In 1993, this reporting category was known as Matters of Convention. The name was changed 
after a 1993 committee recommended that the criteria for this reporting category be expanded 
and made more specific. 


Each essay is scored according to criteria that describe features of the writing in five scale 
scores: Excellent (5), Proficient (4), Satisfactory (3), Limited (2), and Poor (1). The reporting 
category being considered in this study, Matters of Correctness, contributes 7.5 of the possible 
35 marks. Each paper is read and scored independently by two teachers, whose scores are 
combined to produce the student’s mark on the essay. 


Key Questions 

The 2000 study sought to answer the following questions related to both the 1993 essays and 
those written in 2000 receiving scores of Satisfactory (3) in the reporting category Matters of 
Correctness: 


What types of errors in language and expression are common in English 30 papers? 


¢ What types of errors in language and expression are common in English 30 papers? 
¢ How many errors are typical in such papers? 

* What is the relative complexity and length of such papers? 

¢ Is the score awarded appropriate? 


The 2000 study sought to answer the following additional questions related to the essays written 
in 2000 and receiving a score of Excellent (5) in the reporting category Matters of Correctness: 


* Are the types of errors found in those papers receiving Excellent (5) the same as those found 
in those papers receiving Satisfactory (3)? 


¢ Is the awarded score Excellent (5) appropriate? 


The Sample 
In the 2000 study, two groups of papers written in June were examined: 


-70)- 


Group One: a sample of 211 papers that had received Satisfactory (3) in all reporting 
categories from two markers 


Group Two: a sample of 86 papers that had received Excellent (5) in all reporting 
categories from two markers 


The 2000 sample differs from that of the 1993 study in which three groups of papers written in 
both January and June were examined: 


Group One: a sample of 100 papers that had received scores of Satisfactory (3) in all 
reporting categories by three markers 


Group Two: a sample of papers that had received scores of Satisfactory (3) from three 
markers in the reporting category Matters of Convention but scores of Limited (2) on the 
reporting category Thought and Detail 


Group Three: a sample of papers that had received scores of Satisfactory (3) in the 
reporting category Matters of Convention, but Proficient (4) in the reporting category 
Thought and Detail 


The Committee 


In July 2000, the committee consisted of two members who had participated in the 1993 study 
and four who had not. 


The Process of the 2000 Committee 

The 2000 committee began by reviewing the 1993 study, confirming the current standards 
inherent in the scoring criteria, and setting the direction for a comparison of the standards 
between 1993 and 2000. The committee applied the 1993 scoring criteria to papers from both 


1993 and 2000 to ensure that they were characterizing errors in the same way as had been done 
in 1993. 


The committee discovered that applying the scoring criteria used in 1993 was more difficult than 
they had anticipated. Unlike the 1993 committee which had developed the scoring criteria, the 
2000 committee members had the task of being consistent not only with each other but also, with 
the 1993 committee. 


The 1993 committee developed a grid to classify errors in the papers that were identified for the 
study. The grid listed six specific errors under the following headings: Sentence 
Structure/Construction, Punctuation, Pronouns, Verbs, Usage, and Spelling. There were sub- 
categories listed under the specific errors. 


The sub-categories were eliminated because they seemed to overlap with Shift in Verb Tense and 
with Apostrophes, respectively, and because, after 1993, Matters of Convention had been 
renamed Matters of Correctness. The 2000 reviewers thought that assessment of the conventions 


Fiz 


of formal literary analysis (conventions such as use of the literary present and avoidance of 
contractions, abbreviation, slang, and the personal pronouns “I” and “‘you”) properly belonged 
under Matters of Choice, not Matters of Correctness. The committee agreed unanimously. 


The error category Wrong Tense was defined in 2000 as Wrong Tense/Form of the Verb (“Biff 
has came to realize that his father is living in a dream”, “The task of revenge falled to Hamlet”, 
“The literary work that I have chosen 1s a book wrote by Margaret Laurence”). 


The 1993 error category Shift in Construction, was defined as Shift in Construction or 
Incomplete Construction to cover not only mixed metaphors and shifts in subject, mood, or 
voice, but also those instances when students omitted words such as articles, prepositions or 
conjunctions, or failed to complete their thought as in: “I think that the author gives different 
obstacles to allow the characters to overcome” and “Willy is still giving all his effort in his 
ability to be a salesman, and the constant discouragement of not being able to make sales quotas 
even to the extreme of losing his job.” 


In 1993, the error category Shift in Construction, was also differentiated from Faulty Parallelism 
and from Shift in Verb Tense and Shift in Point of View. The committee agreed to limit Faulty 
Parallelism to shifts in construction within lists (“She has money, a large estate, and is happily 
married.” Unlike Olivia, the narrator has found peace in India, feels accepted, and that she 
belongs’) and shifts in construction between ideas joined by correlative conjunctions (such as 
not only...but also or whether...or). 


For the purpose of the 2000 study, the committee also clarified the error categories Homonyms, 
Wrong Word, Incorrect Semantic Relationship, and Spelling as follows. 


The error category Homonyms was limited to those words that sound the same but are 
spelled differently such as “bear” and “bare” or “sight” and “site.” 


The error category Wrong Word was used to cover malapropisms, (“The first character 
that I would like to overlook is Iago.” “Winston suffered the punishment sequestered on 
thought criminals such as himself.” “Laertes and Hamlet engage in a friendly duo.” 
“Robert replies with a response that profounds her.”), commonly confused words (Willy 
Loman installs the wrong set of values in his sons.”) and incorrect or non-existent word 
forms (such as “perseverant,” “disobeyance™ for “disobedience,” or “like” used instead of 
“as if”), 


The error category Incorrect Semantic Relationship was applied chiefly to non-idiomatic 
expressions and to combinations of words that were very unconventional or confusing. 
(“If they do not believe they can achieve their goals this discourages them to trying.” 
“Robert’s sanity was highly questionable by his actions later on” “One can only achieve 
one’s perseverance if the intentions are pure” “They persevere the foundation of what 
the story has to become in order for the readers to enjoy it”). 


The error category Spelling was reserved for those instances when the student was using 
the appropriate word but did not know how to spell it. (“Through out The Wars, Robert 
keeps many horses from parishing” “Claudius was more intelegent than Hamlet” “The 


*79- 


men were sitting in the park competing with eachother” “When Marlow has Kurtz 
already to go back to normal civilization, he disappears”). 


The Confused Syntax error category became a catchall for those sentences that were so muddled 
in structure or thought that the reviewers either could not understand them or could not fit them 
into any other single category. (“She worked hard and always put Turtle number one priority 
was a perfect mother” “As a result, they remained to appear wealthy, yet Paul died” “The 
author uses symbolism to try to show the evil of this, I think the author wanted to show how 
perseverance in an evil way could be, you don’t die, the author, I think was just using this to 
show the evil in it”). 


Another issue the committee debated was how frequently to count the same error. They decided: 


* to count a specific spelling error (e.g. “perserverence” or “primative”) only once even when 
the error was repeated 


* to count shifts in verb tense only once when the student was largely consistent. If, for 
example, a student began his essay in the present tense, then shifted to the past tense and 
maintained it for the remainder of the essay, the student was assessed with only one error. If, 
however, he or she shifted inappropriately between one tense and another throughout the 
essay, then he or she was assessed with an error each time a new shift occurred 


* to count all other errors each time they occurred 


The committee made a distinction in errors between redundancies and verbosity. Verbosity, the 
committee decided, would be assessed under Matters of Choice. Redundancy would be 
considered a simple repetition of an idea (“continue on persevering,” ‘return back,” “self- 
courage,” “more superior’). 


The Results 
In 2000, the Satisfactory (3) essays averaged 30.8 errors per paper compared with 28.0 errors per 


paper in 1993. The 2000 Excellent (5) essays averaged 19.8 errors. To obtain quantitative data, 
the reviewers counted errors, which is not the usual way of dealing with matters of correctness. 


FR. 


Table A1: Error Range in Conventions of Language in "Satisfactory" 
June 2000 English 30 Diploma Exam Essays 


1-5 errors 
6-10 errors 
11-20 errors 
21-30 errors 


31-40 errors 


Average number of errors per paper: 30.8 


Table A2: Error Range in Conventions of Language in "Excellent" June 
2000 English 30 Diploma Exam Essays 


1-5 errors 
6-10 errors 
11-20 errors 


21-30 errors 
31-40 errors 


Average number of errors per paper: 19.8 


While the number of errors in both the Satisfactory (3) and Excellent (5) papers may appear 
large, it must be remembered that these papers are first-draft writing. The number of errors is 
also less significant than the nature of the errors. Although this study, like the previous one, 
produced quantitative results, its usefulness lies in the discussion of the qualitative features of the 
errors that students made. For example, five sentence construction or usage errors may interfere 
more seriously with communication than ten punctuation errors. 


Figure 1a shows the average number of errors made in each category per essay for Satisfactory 
(3) papers for both 1993 and 2000. 


-74- 


Mean Errors 


Figure 1a: Average Number of Errors per Essay by Category 
- Satisfactory (3) for Language Conventions - 


EER eT Cee 
2000; N=211 


Nn 
f 


oS 


ios) 


Category 
@ 1993 0 2000 


45 


Figure 1b shows a comparison between the average number of errors per category in the 
Satisfactory (3) and Excellent (5) papers in 2000. 


Figure 1b: Average Number of Errors per Essay by Category 


- Satisfactory (3) and Excellent (5) for Language Conventions - 


2000 Gs):N=211 
2000 (5s): N=86 


Mean Errors 


* Category 
ee 'G 2000 (3's) 0 2000 (5's) | 


6s 


Punctuation 


The largest number of errors in Satisfactory (3) papers occurred in Punctuation, with an average 
of 8.5 errors per paper in 2000 as compared with 7.7 errors per paper in 1993, The most 
numerous errors in both years were related to comma usage (their omission or inappropriate 
inclusion) and apostrophe usage (their omission or placement). Of the 2000 Satisfactory (3) 
papers, 10.9% had no comma errors compared with 7.5% in 1993 and 23.3% of the 2000 
Excellent (5). The Satisfactory (3) papers contained 10% more apostrophe errors in 2000 than in 
1993. The Excellent (5) papers, however, had significantly fewer apostrophe errors than the 
2000 Satisfactory papers; in fact, 67.4% of the Excellent (5) papers had no apostrophe errors. 
The Excellent (5) papers, contained more colon and semicolon errors than the 2000 Satisfactory 
(3) papers, a situation that may be explained by the fact that very few students who wrote 
Satisfactory (3) papers attempted to use either the colon or the semicolon. Because the long, 
complex sentence structure used in most Excellent (5) papers demanded complex punctuation, 
students understandably made frequent semicolon and colon errors. For the same reason, the 
Excellent (5) papers had more errors in the use of the dash. 


The frequency of punctuation errors in the Satisfactory (3) papers in both years suggests a lack of 
knowledge of the conventions of print, a lack of appreciation for the cadences and logical pauses 
of written language, and a lack of reading experience. 


This information about punctuation is potentially useful for classroom teachers, because direct 
instruction in the conventions of punctuation is likely to be productive when connected to the 
students’ editing and proofreading of their own writing. Even excellent young writers will 
benefit from such instruction, particularly in the appropriate use of colons, semicolons, and 
dashes. 


Table 1a: Incidence and Type of Punctuation Errors in Essays (June 2000) 
Receiving Satisfactory (3) for Matters of Convention 


Percentage of Papers Having... 
| OErrors | 1-3Errors | 4+ Errors 


Comma Error 

Colon Error 

Semicolon Error 

Capitals/Periods 

Apostrophe 

Quotation Marks 
Hyphenation/Dash/Parentheses 
Average number of punctuation errors: 8.5 


{TF 


Table 1b: Incidence and Type of Punctuation Errors in Essays (June 2000) 
Receiving ‘Excellent’ (5) for Matters of Convention 


Percentage of Papers Having... 


| 1-3 Errors |_ 4+ Errors 


Capitals/Periods 

Apostrophe 

Quotation Marks 
Hyphenation/Dash/Parentheses 
Average number of punctuation errors: 4.8 


Sentence Structure/Construction 


As in 1993, the category with the second largest number of errors in 2000 was Sentence 
Structure/Construction. In fact, the average number of errors (6.6 per paper) was the same as in 
1993. 


In 1993, the most frequent errors were classified as Confused Syntax, whereas in 2000, the most 
frequent errors in both Satisfactory (3) and Excellent (5) papers were classified as Shift in 
Construction. Because the two categories are easily confused, however, the apparent change 
may reflect a change in the way reviewers were classifying errors as much as a change in the 
type of errors students made. A shift in construction frequently produces confusing syntax. 
(“By Marlow not giving up at the attack and persisting to go on against the others’ advice shows 
how devoted Marlow is to his work and at striving toward his goal”). 


Unlike errors in punctuation, errors in sentence construction can severely impede communication 


and usually reflect either confused thinking or a limited repertoire of syntactical alternatives. 
Consequently, no quick and easy remedies exist for muddled sentence structure. 


-78- 


Table 2a: Incidence and Type of Sentence Structure/Construction 
Errors in Essays (June 2000) Receiving Satisfactory (3) for Matters of 
Convention 


Percentage of Papers Having... 


N=211 


Shift in Point of View 19.9% 
Shift in Construction 50.2% 
Misplaced or "wrong" 40.3% 


Modifiers 


Run-on Sentences 20.4% 
Comma Splice 40.3% 
Incomplete Sentences 40.8% 
Confused Syntax 36.5% 


Faulty Parallelism 
Average number of punctuation errors: 6.6 


18.0% 


Table 2b: Incidence and Type of Sentence Structure/Construction 
Errors in Essays (June 2000) Receiving Excellent (5) for Matters of 
Convention 


Percentage of Papers Having... 


0 Errors | 1-3 Errors | 4+ Errors 

Shift in Point of View 93.0% 

Shift in Construction 43.0% 51.2% 5.8% 
Misplaced or "wrong" 57.0% 40.7% 2.3% 
Modifiers 

Run-on Sentences 90.7% 9.3% 0.0% 
Comma Splice 65.1% 29.1% 5.8% 
Incomplete Sentences 84.9% 15.1% 0.0% 
Confused Syntax 82.6% 16.3% 1.2% 
Faulty Parallelism 90.7% 9.3% 0.0% 


Average number of punctuation errors: 4.8 


Usage 


In 2000, the largest number of usage errors occurred in the error category Wrong Word. Of the 
Satisfactory (3) papers, 77.8% had such errors as did 69.7% of the Excellent (5) papers. 


-79- 


Nonetheless, there appears to be some improvement from 1993, when 83.1% of the Satisfactory 
(3) papers contained semantic errors. The second largest category, as in 1993, was Incorrect 
Prepositions, with 59.2% of students having trouble with idiomatic use of prepositions. The 
average number of usage errors was 5.0 in the 2000 Satisfactory (3) papers, compared with 5.9 in 
1993, and 4.2 in the Excellent (5) papers. The Satisfactory (3) papers’ imprecision gave the 
impression that students lacked many words with which to express themselves or that they were 
not sure what they wanted to say; they had difficulty choosing words with appropriate 
connotations or knowing how to use these words in context. The Excellent (5) papers contained 
more sophisticated diction, but also contained more idiomatic usage errors. The Excellent (5) 
papers, for example, had a higher percentage of errors in preposition usage than did the 
Satisfactory (3) papers. (“The daughters were bored of their lives”, “Windflower has a main 
character who will let nothing come in between of having a happy family and her,” and “You can 
choose the path in which you walk.”) 


Many of the difficulties in usage were illustrated by the awkward application of the topic word 
“perseverance.” Even though the exam defined the word for them, students often did not know 
other forms of the word or know how to use the word idiomatically or even sensibly. One 
student wrote, ironically enough, “Perseverance is anything that you make it to be.” Examples of 
difficulties in usage follow. 


“William Shakespeare uses the title character to show perseverance equates to his 
mother’s hasty marriage, Ophelia’s unrequited love, and his father’s untimely death.” 


“If you look at my piece of literature they persevere the obstacles that forever change the 
outcome of their lives.” 


“A strong will of perseverance leads to heroism.” 


“The sacrifices were not easy to make, especially at Ian’s age but he felt he would do 
anything to help achieve perseverance and take away his guilt.” 


“The second perseverance that Hagar faced in her life was that she had to live with her 
son Marvin.” 


“The mother is trying to use her perseverance of her strong belief in following family 
traditions to make her son feel guilty... The mother’s perseverance finally collapses upon 
her.” 


These examples suggest either that “perseverance” was a new word for students or they were 
unfamiliar with its use in context or how to employ a dictionary to seek alternative forms. Few 
students, for example, used the verb “’ persevere,” and some created the word “‘perseverant” 
when they needed an adjective. 


In their discussions, reviewers referred to usage/semantics problems as “serious”. Problems with 


imprecision and with non-idiomatic usage interfere with communication in a way that spelling 
and punctuation errors, even though more numerous, do not. 


-80- 


Table 3a: Incidence and Type of Usage Errors in Essays (June 2000) Receiving 
Satisfactory (3) for Matters of Convention 


Percentage of Papers Having... 
| OErrors | 1-3 Errors | 4+ Errors | 


Incorrect Adverbs 

Incorrect Prepositions 

Homonyms 

Wrong Word Semantics (Other than Verbs) 
Incorrect Semantic Relationship 
Redundancies 

Average number of usage errors: 5.0 


Table 3b: Incidence and Type of Usage Errors in Essays (June 2000) Receiving Excellent 
(5) for Matters of Convention 


Percentage of Papers Having... 
| OErrors | 1-3 Errors | 4+ Errors 


Incorrect Adverbs 
Incorrect Prepositions 


Homonyms 

Wrong Word Semantics (Other than Verbs) 
Incorrect Semantic Relationship 
Redundancies 

Average number of usage errors: 4.2 


Verbs and Pronouns 


As in 1993, there were few errors associated with Pronouns and Verbs in the Satisfactory (3) 
papers. In 2000, there was an average of 2.1 pronoun errors per essay and 3.3 verb errors per 
essay. Although verb errors were more numerous in 2000 than in 1993, the increase is likely 
attributable to the combining of the 1993 category Shift from Literary Present with Shift in 
Tense in 2000, the category in which 68.7% of students made errors (as compared with 35.7% in 
1993). There was over 10% improvement in subject-verb agreement and in choice of auxiliary 
verb. The overall incidence of pronoun errors in 2000 was not significantly different from those 
in 1993, although students improved significantly in their use of relative pronouns — with only 
18% making any relative pronoun errors as compared with 33.7% making such errors in 1993. 


Surprisingly, the Excellent (5) papers had a higher percentage of errors in pronoun agreement 
than did the Satisfactory (3) papers. The more frequent generalizations and longer sentences in 
the Excellent (5) papers can perhaps explain this anomaly. (“You cannot imagine what such a 
great person must have gone through, but we must listen to their stories so that the same sort of 
terrifying tragedy may never happen again, but what he did go through proves that even when 
people are faced with extreme adversity you must persevere’). 


-8]- 


Table 4a: Incidence and Type of Pronoun Errors in Essays (June 2000) 
Receiving Satisfactory (3) for Matters of Convention 


Percentage of Papers Having... 
OErrors | 1-3 Errors | 4+ Errors 


Indefinite Pronoun Reference 
Agreement with Antecedent 
Relative Pronoun 

Possessive Pronouns 

Incorrect Pronoun Case 

Average number of pronoun errors: 2.1 


Table 4b: Incidence and Type of Pronoun Errors in Essays (June 2000) 
Receiving Excellent (5) for Matters of Convention 


Percentage of Papers Having... 
| OErrors | 1-3 Errors | 4+ Errors | 


N=86 
Indefinite Pronoun Reference 
Agreement with Antecedent 
Relative Pronoun 
Possessive Pronouns 
Incorrect Pronoun Case 
Average number of pronoun errors: 1.4 


Spelling 


Both the 1993 and 2000 studies showed that students who received Satisfactory (3) on their 
essays can generally spell adequately, but the average number of spelling errors increased from 
4.1 per essay in 1993 to 5.3 in 2000. As one would expect, the most numerous errors were 
classified as General Spelling errors with 89.6% of the students making at least one mistake and 
50.7% making four or more errors, up from 36.3% making four or more errors in 1993. In 2000, 
the incidence of spelling errors in compound words also increased significantly; with 29.9% of 
the students making one to three errors as compared with 16.9% in 1993. Errors in compound 
words are those that involve separating two words that should be joined (“can not,” “a part” 
when a student means “‘apart’”’) or joining two words that should be separated (“infact,” “alright,” 
“bestfriend”). The incidence of errors in Y for I or ei/ie, however, decreased from 17.5% of the 
papers in 1993 to 4.3% in 2000. The students who received Excellent (5) had an average of only 
2.4 spelling errors, which is remarkable considering the length of the papers, the variety and 
sophistication of their diction, and the time constraints under which those students were working. 


Table 6a: Incidence and Type of Spelling Errors in Essays (June 2000) 
Receiving Satisfactory (3) for Matters of Convention 


Percentage of Papers Having... 
| OErrors | 1-3 Errors | 4+ Errors | 


Double Letters 
Y for | or ei/ie 
Prefixes/Suffixes 


Average number of spelling errors: 5.3 


Table 6b: Incidence and Type of Spelling Errors in Essays (June 2000) 
Receiving Excellent (5) for Matters of Convention 


Percentage of Papers Having... 


N=211 


Double Letters 
Y for | or ei/ie 
Prefixes/Suffixes 


Average number of spelling errors: 2.4 


Judgements Regarding Appropriateness of Score for Matters of Correctness 


Figure 2a reveals that markers in 2000 were more accurate in assessing Satisfactory (3) papers in 
Matters of Correctness The change perhaps reflects the greater precision of the descriptors in 
the 2000 marking key, a change recommended by the review committee in 1993 and 
implemented in 1994. Nonetheless, the reviewers in 2000 still considered 38.8% of the papers 
awarded a Satisfactory (3) score as below that standard in Matters of Correctness. And they 
considered 42.9% of the papers awarded Excellent (5) as below that standard, particularly in 


sentence construction and usage. 


-83- 


Percentage of Papers 


0.4 


03 


0.2 


1 


Figure 2a: Most Appropriate Score for Matters of Convention in Essays Reviewed 


Poor (1) 


Limited (2) Satisfactory (3) 
Appropriate Score 


1993 2000, 


-84- 


1993*; N=152 
2000*: N=209 


Proficient (4) 


“1993: 8 papers were not rescored 
*2000: 2 papers were not rescored 


Percentage of Papers 


Figure 2b: Most Appropriate Score for Matters of Convention in Essays Reviewed 


2000 (3's)*: N=209 EN 
2000 (3's)*: N=84 . 


0.3 


0.2 


0.1- 


Poor (1) Limited (2) Satisfactory (3) 
Appropriate Score 


(2000 (3's) 2.2000 (5's) 


-85- 


Proficient (4) Excellent (5) 


*2 papers were not rescored of 
both the 3's and 5's 


Summary of Most Frequent Errors 


Tables 7a and 7b summarize the incidence of the most frequent errors in the 2000 essays 
reviewed. Frequency, however, may not be the most important cause for concern. Different 
types of errors are not of equal weight or impact. While mechanical errors (spelling and 
punctuation) are the most obvious and numerous, errors in sentence construction and usage 
interfere more seriously with clear expression and are more difficult to correct because they 
reflect imperfect understanding or confused thought. When students lack the precise vocabulary 
and syntax through which to convey thought they would certainly have difficulty communicating 
effectively with others. 


Table 7a: Most Frequent Errors in Conventions of Language in "Satisfactory" June 
2000 English 30 Diploma Exam Essays 


Percentage of Papers Having... 
0 Errors] 1-3 Errors] 4-5 Errors] 6-8 Errors] 9-10 Errors| 11+ Errors 


Punctuation 

Comma Error 

Apostrophe 

Sentence Structure/Construction 
Shift in Construction 

Comma Splice 

Usage 

Incorrect Prepositions 

Word Wrong (Semantics) 
Pronouns 

Indefinite Pronoun Reference 
Agreement with Antecedent 
Verbs 

Shift in Tense 

Wrong Tense 

Spelling 

General Spelling 


-86- 


Table 7b: Most Frequent Errors in Conventions of Language in "Excellent" June 
2000 English 30 Diploma Exam Essays 


Percentage of Papers Having... 
| 0 Errors| 1-3 Errors] 4-5 Errors | 6-8 Errors] 9-10 Errors] 11+ Errors | 


Punctuation 

Comma Error 

Apostrophe 
Hyphenation/Dash/Parentheses 
Sentence Structure/Construction 
Shift in Construction 

Misplaced or "wrong" Modifiers 
Usage 

Incorrect Prepositions 

Word Wrong (Semantics) 
Pronouns 

Indefinite Pronoun Reference 
Agreement with Antecedent 
Verbs 

Shift in Tense 

Wrong Tense 

Spelling 

General Spelling 


Summary Comments 2000 Papers 


The reviewers, as one would expect, observed a dramatic difference between the Satisfactory (3) 
and the Excellent (5) papers in length and in complexity and sophistication of literary analysis. 
Student writing of the Satisfactory (3) papers often took a more literal and concrete approach to 
the topic; whereas those writing the Excellent (5) papers assumed a more global perspective and 
explored abstract ideas. The Satisfactory (3) writers tended to develop the topic, not through a 
series of events or examples; these papers made largely mechanical reference to the topic. The 
Excellent (5) writers were more likely to be original in using the assigned topic to develop 
philosophical ideas; they used carefully selected events, characters, literary terminology, or 
quotations only as support, not as a central focus. Students producing Satisfactory (3) papers 
generally took a safe, predictable approach to the topic — that perseverance was good — and then 
described a character that persevered. By contrast, students producing the Excellent (5) papers 
revealed a greater willingness to take risks; they often examined the subject of perseverance as 
having the potential to be destructive as well as beneficial. These students frequently employed 
comparisons of characters or of different works of literature, or made references to irony or 
symbolism, to develop this complexity. 


The students, who wrote Satisfactory (3) papers, while earnest and conscientious, often sounded 
uncertain and mechanical, as though they lacked clarity or conviction about what they were 
writing. These papers gave the impression of having been written sentence by sentence, often 
without much connection between one clause, sentence, or paragraph and the next. The Excellent 


-87- 


(5) papers, however, were characterized by a more individualistic and confident voice that both 
intrigued and carried the reader along. These papers were more likely to have a strong 
controlling idea, a more thoughtful organization, and consequently, smoother and more logical 
transitions between ideas. Thus in the Excellent (5) papers, even those containing mechanical 
errors, the readers could easily follow the train of thought. In the Satisfactory (3) papers, the 
errors, particularly those in sentence construction and usage, were more distracting. 


In both the Satisfactory (3) and the Excellent (5) papers, the most serious problems, though not 
necessarily the most numerous, were in sentence construction and in diction or usage. The 
difficulties in sentence construction in the Satisfactory (3) papers, however, seemed to reflect 
muddled or limited thinking; whereas, the sentence construction difficulties in the Excellent (5) 
papers seemed to reflect complex thinking. On the whole, the sentences in the Excellent papers 
were longer with much more sophisticated subordination. Because these papers often had so 
many ideas “jostling for place,” the students sometimes got lost or had trouble with punctuation. 
As well, the students writing the Excellent (5) papers were more ambitious in their desire to use 
varied and sophisticated diction or to be “fancy,” a desire that sometimes obscured meaning (“In 
a society of superficiality, where reality is disclosed against an assembly of flirtatous balls and 
social gatherings, one’s spirits can be hindered by illusive demeanor; it is when trying to surpass 
the artificiality that one must persist to create their own standards of happiness and discover 
conjugal felicity”). One of the reviewers, however, described the shortcomings of such papers as 
“fascinating in failure” because these students were willing to take risks, to venture further out 
on the intellectual and stylistic limb. Because the Excellent (5) papers were much longer than 
the Satisfactory (3) papers, the reviewers felt that errors might reflect that the students had been 
rushed for time. That is, mechanical problems were the result of the students’ having so much to 
say. 


Length and Complexity 


In addition to tabulating errors according to the five categories discussed above, the committee 
estimated the length of each essay. The committee also reviewed the complexity of thought and 
details, matter of choice, and organization of each essay, and ranked the essays for complexity on 
a five point scale (i.e., | None, 2 Inadequate, 3 Adequate, 4 Fairly High, and 5 High). 


Table 8a: Length and Complexity in "Satisfactory" June 2000 English 30 Diploma Exam Essays 


Percentage of Papers for which the Degree of Complexity was Deemed... 


| Inadequate | Adequate | Fairly High _| Total 


-88- 


Table 8b: Length and Complexity in "Excellent" June 2000 English 30 Diploma Exam Essays 


Percentage of Papers for Which the Degree of Complexity was deemed... fae | 
N=81 | Inadequate | Adequate | FairlyHigh | High __|_ Total 
400-600 words 2.5% 7.4% 1.2% 11.1% 
600 + words 1.2% 3.7% 30.9% 53.1% 88.9% 
Total 1.2% 6.2% 38.3% 54.3% 100.0% 


Of the 86 Excellent (5) essays reviewed, 11.1% were 400 to 600 words long and 88.95% were 
over 600 words. In terms of degree of complexity, 1.2% of the Excellent (5) papers were 
considered to be inadequate, 6.2% adequate, 38.3% fairly high, and 54.3% high. Overall, 84% 
of the papers were over 600 words in length with fairly high to high complexity. 


Of the 211 Satisfactory (3) essays selected, 24.9% were between 200 and 400 words, 44.3% 
between 400 and 600 words, and 30.8% over 600 words. In terms of degree of complexity, 0.5% 
of the papers were considered to be less than inadequate, 40.3% inadequate, 51.7% adequate, and 
7.5% fairly high. In contrast to the 1993 results in which 4 of the 160 essays were considered to 
be highly complex, none of the 211 Satisfactory (3) papers in 2000 was ranked as having high 
complexity. Overall, 68.2% of the papers were considered to be inadequately to adequately 
complex, and their length fell in the range of 400 to 600 words or over 600 words. 


Table 9a: Average Number of Errors per Paper by Length and 
by Complexity for Excellent (5) Essays - June 2000 


Average Number of Errors per Paper for 
Which the Degree of Complexity was 
Deemed... 


N=] e Adequate| High 
600 + words 38 28.3 26.2 


Figure 3a: Average Number of Errors per Essay by Complexity and by Length 
Students with "5" on Language Conventions - June 2000 


Average Number of Errors 


Inadequate Adequate Fairly High High 
Complexity Category 


| —*—400-600 words +—#—600 + words 


Table 9a shows the average number of errors per Excellent (5) paper in relation to length and 
complexity. The same information is graphically represented in Figure 3a. 


As expected, for papers having the same complexity, the incidence of errors increased as the 
length increased. The longer the essay, perhaps the more rushed for time, and more opportunity 
to make errors. Nevertheless, contrary to expectations, the incidence of errors decreased as the 
complexity increased. It appears that a relatively short (400 to 600 words) but highly complex 
essay was optimal and desirable in terms of minimizing errors. 


Table 9b: Average Number of Errors per Paper by Length and by 
Complexity for Satisfactory (3) Essays - June 2000 


Average Number of Errors per Paper for Which 
the Degree of Complexity was Deemed ... 


| None _| Inadequate | Adequate| Fairly High| 


18.0 26.1 21.3 33 
33.7 30.1 28 
: 40.9 31.3 


N=201 
200-400 words 
400-600 words 
600 + words 


-9()- 


Figure 3b: Average Number of Errors per Essay by Complexity and by Length 


Students with "3" on Language Conventions - June 2000 
45 a 


40 
” al 
fo} 35 
ae 
' & 
LW 30 
zr) 

XK 

- 
oO 25. 
rs x 
E 20 
=) 
= x 
® 15 : 
Dn 
oO 
= 
@ 10 
> 
<q 

5 

QoK—- $$$ 

None Inadequate Adequate Fairly High 


Complexity Cate jory 


| ~~ — 200-400 words —?— 400-600 wo ds ~~~ 600 + words | 


Table 9b provides the average number of errors per Satisfactory (3) paper in relation to length 
and complexity. The same information is visually presented in Figure 3b. 


Similar to what was found for the Excellent (5) papers, of the 211 Satisfactory (3) papers 
reviewed, the incidence of errors increased as the leng:h increased. Nevertheless, the 
relationship between the length and the degree of comolexity and incidence of errors is a 
sophisticated and complex issue. For essays of 200 to 400 words, the incidence of errors tended 
to increase as the degree of complexity increased; whereas, for papers of 400 to 600 words, the 
incidence of errors dropped as the degree of complexity increased. For essays exceeding 600 
words, more errors occurred in the papers ranked as adequate than in papers with any other 
degree of complexity. In some cases, when students attempted complex discussion and/or 
unconventional structure, errors may reflect the conditions of writing during the examination. 


Potential Implications 
Implications for Markers 


Something potentially interesting to note is the proportion of errors made up of mechanical errors 
(spelling and punctuation) relative to more serious errors (sentence construction and usage) in 
the Satisfactory (3) and Excellent (5) papers. In 2000, mechanical errors made up 44.8% of the 
Satisfactory (3) papers but only 21.2% of the Excellent (5) papers. By contrast, sentence 
construction/usage errors made up 37.7% of the Satis*actory papers but 45.5% of the Excellent 
papers. Mechanical errors, by their nature, will be more numerous since every sentence offers 
opportunities to spell or punctuate incorrectly; each sentence, however, offers only one or two 
opportunities for sentence construction or usage errors. Since two markers both awarded each of 
the 5 papers an “Excellent” in Matters of Correctness, even when those papers contained an 
average of 19.8 errors, it may that the bias of markers is either to penalize more heavily for 


-Q|- 


mechanical errors than for sentence construction/usage errors or to reward more complex and 
usually longer papers by forgiving such errors. As the marking guide encourages this: 
“Proportion of error to complexity and length of response must also be considered.” Another 
possible explanation is that markers, who are working very quickly, are not as readily identifying 
sentence construction/usage errors. Sometimes, in the Excellent (5) papers, reviewers thought 
that pretentious language and convoluted sentence structure as well as glibness were unduly 
rewarded. 


Implications for Further Study 


The study did not address the types of errors that seemed more numerous in computer-generated 
papers, specifically, the higher incidence of homonym errors and incomplete constructions 
(words left out). The computer-generated papers often seemed to reflect a greater carelessness, 
suggesting perhaps that students are lulled into a false sense of security by the spell-check and 
grammar programs in their computers and thus do not feel the need to proofread. Since this 
observation is largely anecdotal, the reviewers thought that a subsequent study might be useful to 
examine whether there is indeed a significant difference in the nature of errors between 
computer-generated and hand-written papers, and whether the computer-generated papers were 
more or less appropriately scored. (What percentage, for example, of the Excellent (5) papers 
were computer-generated?) 


Implications for Curriculum Development/Teachers 


Like the 1993 study, the 2000 review of Matter of Correctness confirmed that spelling and 
punctuation errors, while more numerous, are not as serious as problems in sentence construction 
and usage. Sentence structure and usage problems are as much a liability in Excellent (5) papers 
as they are in Satisfactory (3) papers even though the causes of such problems may be different. 
All students will potentially benefit from the following: 


* extensive reading, including listening to fluent prose and poetry read aloud 

* frequent writing practice, with particular attention paid to revision and proofreading, 
especially as computer-generated writing becomes more widespread 

* constant critical analysis of writing strengths and weaknesses by peers and teachers 

* some integrated study of grammar, with emphasis on coordination and subordination of ideas 
more fluently and correctly 

* exposure to and practice in applying a variety of syntactical models so that students develop 
alternative ways of constructing sentences 

* extensive effort aimed at extending vocabularies, raising understanding of word roots and 
connotations, and increasing flexibility in transforming words from one part of speech to 
another as appropriate for a given context 

* increased practice in appropriate use of a dictionary, thesaurus, and book of prepositional 
idioms as tools to improve both reading and writing 


Clear, correct, and fluent expression must be an expectation in every course and at every level of 
education. Improvement in language comprehension and use is a long evolutionary process, one 


-9)- 


that cannot be rushed. It requires systematic and coordinated instruction, reinforcement, and 
refinement throughout the grades. Even very competent students, as they grapple with 
increasingly complex ideas, will benefit from further instruction in language use in post- 
secondary institutions. No single educator can do it ail, and no educator should think himself or 
herself exempt from doing his or her bit in the process. The goals of clear, complex thinking and 
clear, cogent expression demand a universal commitment. 


-93- 


APPENDIX 3 


Cognitive Analysis of Specific Examinations 


Achievement-Over-Time Study 


Final Report 


Philip Taranger 


September 2000 


-95- 


Study Participants 


Mathematics 30 
Gerald W. Krabbe, Central Memorial High School 
J. E. (Ted) Lewis, University of Alberta 
Chemistry 30 
Gary Glover, Strathcona-Tweedsmuir High School 
John Washington, Concordia University College 
English 30 


Lynne Gregg, Archbishop MacDonald High School 
Shyamal Bagchee, University of Alberta 


-96- 


Project Overview 


What was the purpose of the study? 


The cognitive analysis of specific examinations was designed to evaluate whether or not the 
cognitive demands of diploma examinations in three subject areas — Mathematics 30, Chemistry 
30, and English 30 — had changed over the fifteen-year period 1985 to 2000. The examinations 
were measured according to two sets of criteria: one drawn from Alberta diploma exam blueprint 
classifications, the other from the national School Achievement Indicators Program (SAIP) 
criteria. 


What is SAIP and why were its assessment criteria used for the study? 


The School Achievement Indicators Program (SAIP) was initiated by the Council of Ministers of 
Education (CMEC) in 1989 in order to arrive at a consensus on the elements of a national 
assessment. The SAIP examinations are administered to 13 and 16 year-old students across 
Canada and assess achievement in reading and writing (1994 and 1998), mathematics (1993 and 
1997), and science (1996 and 1999). The SAIP exams are unique in that they are a national 
assessment of achievement, and as such provide an external set of criteria by which to evaluate 
our own examinations. They also represent a national agreement on assessment criteria. It is for 
this reason that the SAIP criteria were used for the cognitive analysis study. 


What was the study methodology? 


Each of the three subject areas was represented by a secondary teacher and a post-secondary 
instructor from each subject area. Following a training session on applying the SAIP criteria 
and the Alberta diploma exam blueprint criteria, each team conducted a question-by-question 
analysis of the three examinations. For the purposes of the study, the identifying years of exam 
administration were removed from the examination copies used by the study participants. 


The procedure involved the members of each team working through each exam individually. 
Following this individual analysis, the two members of each team compared the results, noting 
and discussing any discrepancies. The results were then summarized and recorded. 


Following the analysis of all three examinations, the team members in each subject area 
compiled an overall summary report comparing the results of each examination. This overall 
summary formed the basis of determining the larger question of whether cognitive demands had 
changed over the period studied. The summary report also included written impressions from 
each team. 


7: 


Mathematics 30 Results 


A distinction is made between type of cognitive skill as opposed to /evel of cognitive demand. 
Type indicates that a particular skill is being measured, /eve/ indicates a hierarchy of demand or 
complexity. The Alberta diploma examination blueprint criteria distinguish types of cognitive 
skill and the SAIP criteria distinguish levels of cognitive demand. Since the primary purpose of 
the study was to measure differences in the level of cognitive demand over time, only the SAIP 
criteria are tabled and summarized in the Interpretation of Results. 


The Alberta diploma examination blueprint criteria used in the assessment of the Mathematics 30 
examinations consist of four categories: Knowledge (K), Comprehension (C), Application (A), 
and Higher Mental Activities (HMA). It is important to note that these types are not always 
distinct; in many cases, for example, a question could be classified as both Application (A) and 
as Higher Mental Activities (HMA). 


The SAIP assessment criteria are organized by levels numbered | through 5, with successive 
numbers representing an increasing level of demand or complexity. As the SAIP examinations 
are designed to assess the achievement of 13 and 16 year-old students and the Alberta diploma 
examinations assess the achievement of Grade 12 students, it was necessary to modify the SAIP 
criteria prior to the cognitive analysis. This modification was performed by drawing key words 
from the descriptions in each level and linking them to specific skills required by students 
completing Mathematics 30. The modified SAIP mathematics criteria are outlined in Appendix 
A of this report. 


The following tables identify the percentage of questions at each cognitive level according to the 
SAIP criteria. Multiple choice questions are summarized in Table I, numerical response in Table 
II, and written response in Table III. 


Table I 
Multiple Choice 
June 1985 June 1990 June 2000 
SAIP Level 3 10% 22.5% 12.5% 
Criteria 
Level 4 48% 42.5% 57.5% 
Level 5 33% 25% 25% 
Level 5+ 9% 10% 5% 


-Q8- 


Table I 
Numerical Response 


SAIP Level 3 
Criteria 
Level 4 
Level 5 
Level 5+ 
Table III 


Written Response 


SAIP Level 2 
Criteria 
Level 3 
Level 4 
Level 5 
Level 5+ 


Interpretation of Results 


June 1985 


June 1985 


June 1990 


17% 
58% 
17% 


8% 


June 1990 


20% 
20% 
20% 


40% 


June 2000 


22% 


33% 


Cation should be exercised in interpreting the percentage conversions in the tables, particularly 
in the numerical and written response. The number (77) of questions is less than 10, so a 
difference of 10% may represent a single question. 


In general, the multiple choice questions on the three examinations indicate a comparable level 
of demand. The June 2000 examination has a lower percentage of Level 5/5+ questions than the 
other two examinations, but there are more questions at Level 4 on the June 2000 than on either 
the June 1985 or June 1990 examination. The June 2000 examination exhibits a more balanced 
distribution of questions among levels. 


-99- 


The June 1985 examination did not have a numerical response section, making accurate 
comparison between it and the June 1990 and June 2000 examinations difficult. The addition of 
the numerical response section certainly changes the nature of the cognitive demand, but it is 
impossible to determine whether or not its presence increases the complexity of the examinations 
as a whole, although this is a possible conclusion. In assessing the numerical response of the 
June 1990 and June 2000 examinations, the examinations indicate comparable levels of cognitive 
demand. 


In the written response, 50% of the questions on the June 2000 examination are at Level 4 
compared to 20% on the June 1985 and June 1990. Although the June 1985 examination 
indicates a higher percentage of questions at Level 5+, the criteria used in the study do not 
measure precisely the significant difference in the style of these questions. The June 2000 
questions are embedded in a much more detailed context and require students to synthesize 
knowledge in order to arrive at a solution. Therefore the overall cognitive demand of the June 
2000 written response questions is approximately equal to those of the June 1985 examination. 


Conclusion 


The curriculum change in 1991 has altered the course content being tested, but does not appear 
to have altered the cognitive demand of subsequent examinations. What has changed over time is 
the nature of the questions, particularly in the written response. The weighting of the written 
response section has also increased, from 20% in 1985 and 1990 to 30% in 2000. When all 
factors are considered, the examinations have remained relatively consistent in their cognitive 
demands, although the nature of those demands has changed over time. 


-100- 


Chemistry 30 Results 


A distinction is made between rype of cognitive skill as opposed to /evel of cognitive demand. 
Type indicates that a particular skill is being measured, /evel indicates a hierarchy of demand or 
complexity. The Alberta diploma examination blueprint criteria distinguish types of cognitive 
skill and the SAIP criteria distinguish levels of cognitive demand. Since the primary purpose of 
the study was to measure differences in the level of cognitive demand over time, only the SAIP 
criteria are tabled and summarized in the Interpretation of Results. 


The Alberta diploma examination blueprint criteria used in the assessment of the Chemistry 30 
examinations consist of three categories: Knowledge (K), Comprehension and Application 
(C&A), and Higher Mental Activities (HMA). It is important to note that these types are not 
always distinct; in many cases, for example, a question could be classified as both 
Comprehension and Application (C&A) and as Higher Mental Activities (HMA). 


The SAIP assessment criteria are organized by levels numbered | through 5, with successive 
numbers representing an increasing level of demand or complexity. As the SAIP examinations 
are designed to assess the achievement of 13 and 16 year-old students in science (including 
chemistry, physics and biology) and the Alberta diploma examinations assess the achievement of 
Grade 12 students in each of these areas separately, it was necessary to modify the SAIP criteria 
prior to the cognitive analysis. This modification was performed by drawing key words from the 
description of each level and linking them to specific skills required by Grade 12 students 
completing Chemistry 30. The modified SAIP science criteria are outlined in Appendix B of this 
report. 


The following tables identify the percentage of questions at each cognitive level according to the 


SAIP criteria. Multiple choice questions are summarized in Table I, numerical response in Table 
Il, and written response in Table III. 


Table I 
Multiple Choice 


June 1985 June 1990 June 2000 


SAIP Level 3 27% 24% 20% 
Criteria 
Level 4 41% 47% 41% 
Level 5 28% 23% 32% 


Level 5+ 4% 6% 1% 


-101- 


Table I 
Numerical Response 


June 1985 June 1990 June 2000 


SAIP Level 3 -- -- 17% 
Criteria 
Level 4 -- -- 50% 
Level 5 - 57% 33% 
Level 5+ ~ 43% -- 
Table III 


Written Response 


June 1985 June 1990 June 2000 


SAIP Level 3 = 16% = 
Criteria 
Level 4 16% 50% ss 
Level 5 34% 17% 100% 
Level 5+ 50% 17% -- 


Interpretation of Results 


Caution should be exercised in interpreting the percentage conversions in the tables, particularly 
in the numerical and written response. The number (77) of questions is less than 10, so a 
difference of 16% may represent a single question. 


In the multiple choice, the June 2000 examination has a higher percentage of Level 5/5+ 
questions than the June 1985 examination. On the June 1985, the questions are clearly and 
briefly worded. There are many non-calculation questions that do not require linking skills or 
the application of ideas. The June 1990 is similar in style to the June 1985 but features more 
advanced questions; many were multiple-step problems with little direction or guidance 
provided. 


-102- 


The June 1985 examination did not have a numerical response section, making accurate 
comparison between it and the June 1990 and June 2000 examinations difficult. The addition of 
the numerical response section certainly changes the nature of the cognitive demand, but it is 
impossible to determine whether or not its presence increases the complexity of the examination 
as a whole, although this is a possible conclusion. The expanded curriculum (effective 1994) 
means that more content is being tested on the June 2000 examination. For example, the concept 
of “equilibrium” is now being tested, and an increased level of acid-base knowledge is required. 
When all factors are considered, the numerical response section of the June 1990 and June 2000 
examinations are approximately equal in level of cognitive demand. 


In the written response, the June 1985 and June 1990 examinations were similar in question type 
and level of cognitive demand. Questions were clear, specific, and analytical in nature. 
Multiple-step calculations were required on several of the questions. All of the questions on the 
June 2000 examination were at Level 5, but the June 1985 and June 1990 examinations had some 
questions at Level 4. As in Mathematics 30, the June 1990 and June 2000 Chemistry 30 written 
response questions are open-ended, holistic, and require students to synthesize knowledge and to 
determine the criteria for a complete answer. The June 2000 questions also contain a significant 
amount of background material in the form of scenarios, and thus reading skills are important. 


Conclusion 


The June 2000 examination tests a considerably expanded curriculum that emphasizes the 
relationship of chemistry to the real world. A group of questions that cover several different 
topics and/or units are linked together by a “real world” scenario. Success on the June 2000 
examination requires that students know their chemistry and also requires a greater number of 
additional skills. The combination of questions from different units requires students to change 
their line of thinking more often and be able to recognize which concept in which unit is 
applicable to the question. The application scenarios require the students to read more 
information and determine which information is applicable to the question. In general, the 
holistic nature of the questions on the June 2000 examination makes them equal to or greater 
than the 1985 and 1990 questions in terms of level of cognitive demand. 


-103- 


English 30 Results 


A distinction is made between type of cognitive skill as opposed to /evel of cognitive demand. 
Type indicates that a particular skill is being measured, /evel indicates a hierarchy of demand or 
complexity. The Alberta diploma examination blueprint criteria distinguish types of cognitive 
skill and the SAIP criteria distinguish levels of cognitive demand. Since the primary purpose of 
the study was to measure differences in the level of cognitive demand over time, only the SAIP 
criteria are tabled and summarized in the Interpretation of Results. 


The Alberta diploma examination blueprint criteria used to assess the English 30 examinations 
consist of three categories: Literal Understanding (LU), Inference and Application (IA), and 
Evaluation (E). Although generally speaking IA questions represent a higher level of cognitive 
demand than LU questions, E questions are not necessarily more complex than IA questions. An 
E designation means simply that the student is required to make the best choice of four 
alternatives, each of which may be partly correct. 


The SAIP criteria are organized by levels numbered | through 5, with successive numbers 
representing an increasing level of difficulty. The SAIP reading criteria were used to assess both 
texts and questions. The Reading criteria identify texts and questions as being straightforward 
(at Level 1) through to sophisticated (Level 5). In addition, the levels draw distinctions between 
interpreting, evaluating, and exploring surface, implied, and complex meanings in texts and 
questions. The SAIP Reading Criteria are illustrated in Appendix C of this report. 


Table I identifies the number of texts and assignments at each cognitive level in the part A: 
Written Response section of the examination. Since the Part A: Written Response consists of 
only one (or two) texts and two (or three) assignments, the results in this table have not been 
converted into percentages. Table II identifies the percentage of texts at each cognitive level in 
the Part B: Multiple Choice section. Table III identifies the percentage of questions at each 
cognitive level for the Part B: Multiple Choice section. 


Table I 


Part A: Written Response 


June June June 
1985 1990 2000 
Texts Assignments Texts Assignments Texts Assignments 
SAIP L3 | l | | -- -- 
Criteria 
(Level) L4 | I -- l -- 2 
LS -- | -- -- I -- 


-104- 


Table II 


Part B: Multiple Choice (Texts) 


June 1985 June 1999 June 2000 


SAIP Level 3 50% 28% 25% 
Criteria 
Level 4 40% 29% 50% 
Level 5 10% 43% 12.5% 
Level 5+ -- -- 12.5% 
Table I] 


Part B: Multiple Choice (Questions) 


June 1985 June 1990 June 2000 


SAIP Level 2 -- 4% a= 
Criteria 
Level 3 62% 29% 34% 
Level 4 33% 50% 55% 
Level 5 5% 17% 11% 


Interpretation of Results 


Caution should be exercised in interpreting the percertage conversions in Table II. The number 
(n) of texts is less than 10, so a difference of 10% may represent a single text. 


-105- 


Part A: Written Response 


The Written Response section in June 1985 combined two simple texts with one simple and two 
complex assignments. In June 1990, one simple text was combined with one simple and one 
complex assignment. In June 2000, one sophisticated text was combined with two complex 
assignments. The June 2000 examination represents a significantly higher level of cognitive 
demand than either the June 1990 or June 1985 examinations. 


Part B: Multiple Choice 


The texts on both the June 1990 and June 2000 examinations were at a higher level of cognitive 
demand than the texts on the June 1985 examination. Although there was a greater number of 
SAIP Level 5 texts on the June 1990 examination, the texts on the June 2000 examination were 
considerably longer, more dense, and thus higher in cognitive demand overall. 


On the June 1985 examination, the great majority of questions were straightforward or somewhat 
more complex, with only five questions of 80 at Level 5. There was a greater number of Level 5 
questions on the June 1990 examination than on the June 2000 examination, but there were no 
Level 2 questions on the June 2000 examination. 


Conclusion 


The Part A: Written Response of the English 30 diploma examination demonstrates a significant 
increase in cognitive demand over the period studied, with a more consistent balance between 
text and task being achieved. Both texts and assignments exhibit a higher level of cognitive 
demand in 2000 than in 1985. The Part B: Multiple Choice section has likewise become more 
complex over time, with both texts and questions making considerably higher cognitive demands 
of students. 


-106- 


Appendix A 


Modified SAIP Mathematics Criteria 


As questions at SAIP Level 1 were not anticipated on the Grade 12 diploma examination, only 
Levels 2 through 5+ were modified from existing criteria. 


Level 2 the ability to draw diagrams from information provided 


Level 3 the ability to apply basic knowledge and definitions to straightforward questions 
involving numerical calculations 

Level 4 the ability to recognize and understand the direction to take in solving a problem 
the ability to use introductory problem solving, apply algebraic abilities, perform 
one-step calculations, and apply straightforward procedures 

Level 5 the ability to apply knowledge and combine ideas 
the ability to apply two or more distinct procedures 


the ability to solve problems requiring interpretation, analysis, and application 


Level 5+ requires the abilities outlined in Level 5 AND one or more of: 
generalization 
new or unfamiliar settings 


justification and procedures 


-107- 


Appendix B 


Modified SAIP Science Criteria 


As questions at SAIP Science Level 1 or 2 were not anticipated on the Grade 12 diploma 
examination, only Levels 3 through 5 were modified from existing criteria. 


Level 3 


Level 4 


Level 5 


basic knowledge or definition 
simple application of definitions 
use of information from data sheet 


basic calculations 


knowledge question requiring the combination of two or three separate ideas 
reasoning required 


calculation question requiring recognition and a single (but significant) single- 
step calculation 


interpretation of given experimental data 

knowledge question requiring the combination of numerous significant ideas or 
the application of several pieces of data 

calculation question requiring two or more significant operations 


design of experiment and prediction and interpretation of results 


-108- 


Appendix C 
SAIP Reading Criteria 


Level 1 - The student reader interprets, evaluates, explores surface meanings from straightforward 
texts and some meaning from more complex texts by: 


responding to vocabulary, syntax, concrete details, directly stated ideas or key points: 
making judgments about purpose, content, or relationships; 


exploring in the context of personal experience. 


Level 2 - The student reader interprets, evaluates, explores surface and/or directly implied meanings 
from straightforward texts and some meaning from more complex texts by: 


responding to concrete details, strongly implied ideas, or key points 
making supported judgments about purpose, content, or relationships; 
exploring in the context of personal experience and understanding. 


Level 3 - The student reader interprets, evaluates, explores complex meanings in complex texts and 
some meaning from sophisticated texts by: 


responding to more abstract language, details, ideas 
making informed judgments about purpose, content, or relationship among elements 
exploring and demonstrating personal understanding and appreciation 


Level 4 - The student reader interprets, evaluates, explores complex meaning in complex texts and in 
some sophisticated texts by: 


responding to more subtle and/or implicit language, details, and ideas: 
making well-supported judgments about purpose, content, or relationships 
exploring and integrating a thoughtful understanding and appreciation 


Level 5 - The student reader interprets, evaluates, explores complex meanings in sophisticated texts 
and questions by: 


responding to elements of style, selection of details, matters of organization and characterization 
and complex ideas; 


making insightful and substantiated relationships between content, purpose, and style 


exploring and integrating insightful and substantial understanding and appreciation 


-109- 


-110- 


APPENDIX 4 


A Probe into the Perceptions of High School Teachers and 
Post-Secondary Instructors Regarding the ‘Preparedness’ of 
Students for Post-Secondary Studies 


Dr. Lewis Callahan 
September 2000 


Statement of the Purpose 


There were two purposes to this study. The first was to assess college, university, and technical 
institute instructors’ perceptions of the adequacy of skill competencies of incoming post- 
secondary freshmen in the following areas: English, Mathematics, and Chemistry. The second 
purpose was to assess high school teachers’ perceptions with respect to how well high school 
graduates are prepared for post-secondary studies, in the same three areas: English, 
Mathematics, and Chemistry. 


Methods 
Participants 


Four different samples were used in this study. The first convenience sample included 77 
college, university, and technical institute instructors from the following institutions: 

Grant MacEwan, NAIT, University of Alberta, University of Calgary, Mount Royal College, 
SAIT, and Red Deer College. Fifty of the 77 participants in this sample were at least 46 years 
old; 51 were male,26 were female. 


Three separate and independent samples comprised of high school diploma examination markers, 
who taught in each of the three subjects under consideration, were used to gauge high school 
teachers perceptions. There were 155 english teachers in the English 30 sample; 77 chemistry 
teachers in the Chemistry 30 sample; and 48 mathematics teachers in the Mathematics 30 
sample. 


Instruments 


The instrument used to assess college, university and technical institute instructors’ perceptions 
contained 11 closed-ended questions. The three instruments used to collect high school teachers’ 
perceptions, each contained eight Likert style questions, with a space for optional open-ended 
responses below each question. Each of the high school instruments were identical in structure. 
They differed only by the subject matter they sought to collect information about. 


Procedures 


Each of the surveys were created by the researcher, with modifications made after consultation 
with a panel of experts from the Department of Learning. 


The Director, Institutional Planning & Coordination, Alberta Learning, sent letters to the Vice- 
President of Curriculum, for each of the seven post-secondary institutions that were chosen to 
participate in the study. The letter asked for names of individual post- secondary instructors 
whom the researcher could contact to respond to the survey. 


The researcher met with each Vice-President (or their assistant), upon arriving at each campus. 
In addition to contacting the names of the respondents, the researcher asked for and received 
permission to also approach other instructors for their input, while on campus. Instructors were 
asked to volunteer four minutes of their time to answer questions from the survey, posed and 
recorded by the researcher. 


For the grade 12 teacher surveys, Alberta Learning managers leading the diploma examination 
marking sessions read the instructions for each of the three high school surveys, to the intended 
samples, and collected the completed surveys. 


The researcher entered the data from the post-secondary instructors surveys into SPSS 6.1. A 
column was created for each of the I variables of interest (i.e. |] questions on the survey). An 
additional variable “Identification” was created, and numbered one to 77, for each of the 
completed surveys. 


The data for each of the eight questions, from each of the three high school perception surveys 
were summarized by System Improvement and Reporting, Alberta Learning. The researcher 
used these summaries to conduct the appropriate statistical analyses. 


Statistical significance is based on convention, where tests for statistical significance were based 
on an alpha of .05. 


Data Analysis 


Data analysis for the post-secondary surveys consisted of chi squares, and frequencies for the 
variables under examination. 


Chi squares were conducted to determine whether statistically significant differences existed 
between proportions actually observed with proportions expected, for the following sets of 
variables: 


1) Subject matter that one teaches with perceived importance for an incoming freshman student 
to possess grade twelve proficiency in english, mathematics and chemistry respectively. 


2) Age range of the participant with perceived importance for an incoming freshman student to 
possess grade twelve proficiency in english, mathematics, and chemistry respectively. There 
were initially four age categories in the survey of college, university, and technical institute 
instructors: Less than 25, 26-35, 36-45, and 46+. Since no one in the survey was less than 25, 
and only one fell into the range (26-35) and he revealed his age as being 35, the categories were 
collapsed into two for statistical analysis: 35 to 45 and 46+. 


3) Gender of the participant, with perceived importance for an incoming freshman student to 
possess grade twelve proficiency in english, mathematics, and chemistry respectively. 


-113- 


Frequencies were tabulated for questions six through 8, on the post-secondary survey, which 
pertained to perceived importance for incoming freshman to possess grade twelve proficiency in 
english, mathematics, and chemistry, in order to do well in the subject taught by the instructor. 


Frequencies and descriptive statistics were calculated for questions nine through 11 on the post- 
secondary survey, which asked the respondent whether the skills sets and competencies of 
incoming freshmen had improved, stayed about the same, or declined over the past 10 years for 
each of the following subject areas: English, mathematics, and chemistry. 


Chi-squares were run to determine whether there were statistically significant differences 
between a) age categories, b) gender categories, and c) subject area taught categories and 
perceptions of college, university, and technical institute instructors about whether 1) 
mathematics, 2) English and 3) chemistry freshmen’s skill sets and competencies have improved, 
stayed the same of decreased over the past 10 years. 


Frequencies and descriptive statistics were calculated for each of the eight identical questions, 
for each of the three high school surveys: English 30, Mathematics 30, and Chemistry 30. 


Results 


College, University, and Technical Institute Instructors 


Questions 1-5 collected demographic data. 


For question #6, “Do the subjects you teach require, at a minimum, twelfth grade proficiency in 
English, 70 responded “yes”, and 7 “no”, for a total of 77 responses. The response rate was 
90.9% affirmative. 


The Chi-Square comparing the four different categories of subject matter taught (English, 
mathematics, chemistry, and “‘other’’) with perceived need to have students with at least grade 12 
proficiency in English, showed no statistically significant differences, Chi-Square (3, N= 77) = 
3.63175, _p < .30407. 


The Chi-Square comparing the two age different age categories (35-45 and 46+) with the 
perceived need to have students with at least grade twelve proficiency in English, showed no 
statistically significant differences, Chi-Square (1, N=77) = 0.20533, p < .65045. 


The Chi-Square comparing gender with the perceived need to have students with at least grade 
twelve proficiency in English, showed no statistically significant differences, Chi-Square (1, 
N=77) = .09291, p< .76051. 


For question #7, “Do the subjects you teach require, at a minimum, twelfth grade proficiency in 
mathematics, 57 responded “yes”, and 20 “no”, for a total of 77 responses. The response rate 
was 74.0% affirmative. 


-114- 


The Chi-Square comparing the four different categories of subject matter taught (English, 
mathematics, chemistry, and “‘other’”) with perceived need to have students with at least grade 12 
proficiency in mathematics, showed statistically significant differences, Chi-Square (3, Se77)= 
28.76055 p < .000001. 


The Chi-Square comparing the two age different age categories (35-45 and 46+) with the 
perceived need to have students with at least grade twelve proficiency in mathematics, showed 
no statistically significant differences, Chi-Square (1, N=77) = 0.28899, p < .59087. 


The Chi-Square comparing gender with the perceived need to have students with at least grade 
twelve proficiency in mathematics, showed statistically significant differences, Chi-Square (1, 
N=77) = 8.31393, p < .00393. 


For question #8, “Do the subjects you teach require, at a minimum, twelfth grade proficiency in 
chemistry, 29 responded “yes”, and 48 “no”, for a total of 77 responses. The response rate was 
37.7 % affirmative. 


The Chi-Square comparing the four different categories of subject matter taught (English, 
mathematics, chemistry, and “‘other”) with perceived need to have students with at least grade 12 
proficiency in chemistry, showed statistically significant differences, Chi-Square (3, N=77) = 
48.04510, p < .000001. 


The Chi-Square comparing the two different age categories (35-45 and 46+) with the perceived 
need to have students with at least grade twelve proficiency in chemistry, showed no statistically 
significant differences, Chi-Square (1, N=77) = 0.00692, p < .93368. 


The Chi-Square comparing gender with the perceived need to have students with at least grade 
twelve proficiency in chemistry, showed no statistically significant differences, Chi-Square (1, 
N=77) = 1.20561, p < .27220. 


For question #9, “For mathematics, have the skill sets and competencies of incoming freshman 
over the past 10 years: a) improved, b) stayed about the same, c) declined”, four responded 
“improved”, 34 “stayed about the same”, 36 “declined”, for a total of 74 responses. The mean is 
2.432, and the mode Is 3.0 (1.e. “declined”). 


The Chi-Square comparing the four different categories of subject matter taught (english, 
mathematics, chemistry, and “other”) with the perception of whether the mathematics skills and 
competencies of incoming freshmen over the past 10 years indicated no statistically significant 
differences, Chi-Square (1, N=77) = 0.18265 , p < .91272. 


The Chi-Square comparing the two different age categories (35-45 and 46+) with the perception 


of whether the mathematics skills and competencies of incoming freshmen over the past 10 years 
indicated no statistically significant differences, Chi-Square (1, N=77) = 0.01650, p < .89779. 


-115- 


The Chi-Square comparing gender with the perception of whether the mathematics skills and 
competencies of incoming freshmen over the past 10 years indicated statistically significant 
differences, Chi-Square (1, N=77) = 6.32255 , p < .01192. 


For question #10, “For English, have the skill sets and competencies of incoming freshman over 
the past 10 years: a) improved, b) stayed about the same, c) declined”, none responded 
“improved”, 20 “stayed about the same”, 34 “declined”, for a total of 54 responses. The mean is 
2.630, and the mode is 3.0 (i.e.“‘declined’’). 


The Chi-Square comparing the four different categories of subject matter taught (english, 
mathematics, chemistry, and “other’’) with the perception of whether the english skills and 
competencies of incoming freshmen over the past 10 years indicated no statistically significant 
differences, Chi-Square (1, N=77) = 6.76203 , p < .34342. 


The Chi-Square comparing the two different age categories (35-45 and 46+) with the perception 
of whether the English skills and competencies of incoming freshmen over the past 10 years 
indicated no statistically significant differences, Chi-Square (1, N=77) = 4.44899, p < .10812. 


The Chi-Square comparing gender with the perception of whether the english skills and 
competencies of incoming freshmen over the past 10 years indicated no statistically significant 
differences, Chi-Square (1, N=77) = 2.32510, p < .31269. 


For question #1 1, “For chemistry, have the skill sets and competencies of incoming freshman 
over the past 10 years: a) improved, b) stayed about the same, c) declined”, none responded 
“improved”, 11 “stayed about the same”, 11 “declined”, for a total of 22 responses. The mean is 
2.5, and the mode is 2.5. 


The Chi-Square comparing the four different categories of subject matter taught (english, 
mathematics, chemistry, and “other”) with the perception of whether the chemistry skills and 
competencies of incoming freshmen over the past 10 years indicated no statistically significant 
differences, Chi-Square (1, N=77) = 2.2000. , p < .33287. 


The Chi-Square comparing the two different age categories (35-45 and 46+) with the perception 
of whether the chemistry skills and competencies of incoming freshmen over the past 10 years 
indicated no statistically significant differences, Chi-Square (1, N=22) = 3.14286, p < .07626. 
The Chi-Square comparing gender with the perception of whether the chemistry skills and 
competencies of incoming freshmen over the past 10 years indicated no statistically significant 
differences, Chi-Square (1, N=22) = 3.14286 , p < .07626. 


High School English Teachers 


A total of 155 surveys were returned. In some cases individual questions were not responded to. 


-116- 


Question # |. “In the 1989/90 school year, were you teaching English 30? 103 responded “yes”, 
and 5] “no”, there was no response. 


Question #2 asked the respondents to report how many years they had taught Erglish 30, from 
1989/90 to 1999/2000. Seventy-three of 154 (there was ‘one’ no response) have taught less than 
10 years. 


Question #3. “The current English 30 curriculum provides students with the knowledge and 
skills needed to continue in related post-secondary studies. 


“One responded “strongly disagree”, 7 responded “disagree”, 63 responded “agree”, 76 
responded “strongly agree”, for a total of 147 responses. The mean was 3.46 and the mode is 4.0 
(strongly agree). 


Question #4. “The current English 30 diploma examination adequately reflects the curriculum.” 
Four responded Strongly “disagree”, 26 responded “disagree”, 75 responded “agree”, 47 
responded “strongly agree”, for a total of 152 responses. The mean was 3.09 and the mode is 3.0 


(agree). 


Question #5. “The current English 30 diploma examination is an adequate measure of the 
knowledge and skills needed to continue in post-secondary studies.” 


Six responded “strongly disagree”, 27 responded “disagree”, 63 responded “agree”, and 48 
responded “strongly agree’, for a total of 144 responses. The mean is 3.06 and the mode is 3.0 


(agree). 


Question #6. “The English 30 diploma examinations are more rigorous now than they were 10 
years ago.” 


Thirteen responded “strongly disagree”, 31 “disagree”, 30 “agree”, and 36 “strongly agree”, for a 
total of 110 responses. The mean is 2.81, and the made is 4.0 (strongly agree). 


Question #7. “Marking standards for the English 30 diploma examinations are more rigorous 
now than they were 10 years ago.” 


Thirteen responded “strongly disagree”, 40 “disagree”, 34 “agree”, and 20 “strongly agree”, for a 
total of 107 responses. The mean is 2.57 and the mode is 2.0 (disagree). 


Question #8 For English, have the skill sets and competencies of the post-secondary freshmen 
over the past 10 years: 1) improved, 2) stayed about the same, 3) declined. 


Thirty-nine said “improved”, 36 “stayed about the same”, and 19 “declined”, for a total of 94 
responses. The mean was 1.79, and the mode was 1.00 (improved). 


“V7: 


High School Chemistry Teachers 
A total of 77 surveys were returned. In some cases individual questions were not responded to. 


Question # |. “In the 1989/90 school year, were you teaching Chemistry 30? 42 responded 
“yes”, and 33 “no.” 


Question #2 asked the respondents to identify how many years they had taught Chemistry 30, 
from 1989/90 to 1999/2000. Forty-six of 77 had taught less than 10 years. 


Question #3. “The current Chemistry 30 curriculum provides students with the knowledge and 
skills needed to continue in related post-secondary studies.” 


No one responded “strongly disagree”, 3 responded “disagree”, 35 responded “agree”, 32 
responded “strongly agree”, for a total of 70 responses. The mean is 3.41 and the mode is 3.0 
(agree). 


Question #4. “The current Chemistry 30 diploma examination adequately reflects the 
curriculum.” 


One responded “strongly disagree”, 7 responded “disagree”, 37 responded “agree”, and 3] 
responded “strongly agree”, for a total of 76 responses. The mean is 3.29 and the mode is 3.0 
(agree). 


Question #5. “The current Chemistry 30 diploma examination is an adequate measure of the 
knowledge and skills needed to continue in post-secondary studies.” 


One responded “strongly disagree”, 7 responded “disagree”, 37 responded “agree”, and 23 
responded “strongly agree”, for a total of 68 responses. The mean is 3.21 and the mode is 3.0 


(agree). 


Question #6. “The Chemistry 30 diploma examinations are more rigorous now than they were 
10 years ago.” 


No one responded “strongly disagree”, 9 “disagree”, 11 “agree”, and 29 “strongly agree”, for a 
total of 49 responses. The mean is 3.41, and the mode is 4.0 (strongly agree). 


Question #7. “Marking standards for the Chemistry 30 diploma examinations are more rigorous 
now than they were 10 years ago.” 


One responded “strongly disagree”, 10 “disagree”, 17 “agree”, and 10 “strongly agree”, for a 
total of 38 responses. The mean is 2.95 and the mode is 3.00 (agree). 


Question #8 For Chemistry, have the skill sets and competencies of the post-secondary freshmen 
over the past 10 years: |) improved, 2) stayed about the same, 3) declined. 


-118- 


Twenty-three said “improved”, 12 “stayed about the same”, and 3 “declined”, for a total of 38 
responses. The mean was 1.47, and the mode was 1|.00 (improved). 


High School Mathematics Teachers 
A total of 48 surveys were returned. In some cases individual questions were not responded to. 


Question # 1. “In the 1989/90 school year, were you teaching Mathematics 30? 34 responded 
“yes”, and 13 “no.” 


Question #2 asked the respondents to identify how many years they had taught Mathematics 30, 
from 1989/90 to 1999/2000. Twenty-two of 47 had taught less than 10 years. 


Question #3. “The current Mathematics 30 curriculum provides students with the knowledge and 
skills needed to continue in related post-secondary studies. 


No one responded “strongly disagree”, 10 responded “disagree”, 30 responded “agree”, and 5 
responded “strongly agree”, for a total of 45 responses. The mean was 2.89 and the mode is 3.0 
(agree). 


Question #4. “The current Mathematics 30 diploma examination adequately reflects the 
curriculum.” 


One responded “strongly disagree”, 2 responded “disagree”, 25 responded “agree”, and 20 
responded “strongly agree”, for a total of 48 responses. The mean is 3.33 and the mode is 3.0 
(agree). 


Question #5. “The current Mathematics 30 diploma examination is an adequate measure of the 
knowledge and skills needed to continue in post-secondary studies.” 


No one responded “strongly disagree”, 15 responded “disagree”, 22 responded “agree”, and 8 
responded “strongly agree”, for a total of 45 responses. The mean is 2.84 and the mode is 3.0 


(agree). 


Question #6. “The Mathematics 30 diploma examinations are more rigorous now than they 
were 10 years ago.” 


One responded “strongly disagree”, eight “disagree”, 10 “agree”, and 21 “strongly agree”, for a 
total of 40 responses. The mean is 3.28, and the moce is 4.0 (strongly agree). 


Question #7. “Marking standards for the Mathematics 30 diploma examinations are more 
rigorous now than they were 10 years ago.” 


«| 19- 


Two responded “strongly disagree”, 17 “disagree”, 7 “agree”, and 6 “strongly agree”, for a total 
of 32 responses. The mean is 2.53 and the mode 1s 2.0 (disagree). 


Question #8 For Mathematics, have the skill sets and competencies of the post-secondary 
freshmen over the past 10 years: 1) improved, 2) stayed about the same, 3) declined. 


Eight said “improved”, 15 “stayed about the same”, and 14 “declined”, for a total of 37 
responses. The mean was 2.16, and the mode was 2.00 (stayed about the same). 


Limitations 


1. The college, university, and technical institute instructor sample was not randomly selected. 
This could impair generalizability. 


2. Each of the three high school teacher surveys contained many participants with less than 10 
years of teaching experience and/or who had not been in the baseline year of 1989/90. Those 
participants with less than 10 years (especially those with only 2 or 3 years experience), will not 
likely have been able to have made an informed opinion based on a trend, for the time period in 
question. 


3. The high school samples have a potential “double” vested interest versus, the post-secondary 
sample. First, they are the providers of the service under consideration. Second, as markers of 
the departmental exams, they are more than likely supporters/defenders of the exams or marking 
practices thereof compared to the population of teachers as a whole. 


Discussion 


College, University, and Technical Institute Instructors 


Question #6: The 90.9% affirmative response rate, suggests that post-secondary instructors 
overwhelming perceive grade 12 proficiency in english as a necessary prerequisite for their post- 
secondary subject areas. The respective Chi-Squares indicate that these results are not limited by 
subject matter taught, age, or gender. 


Question #7: The 74.0% affirmative response rate, suggests the preponderance of post-secondary 
instructors perceive grade 12 proficiency in mathematics as a necessary prerequisite for their 
post-secondary subject areas. The Chi-squares indicated that these results were not limited by 
gender or age. Instructor’s responses, however, were influenced by the subject matter taught, 

p < .000001. Chi-squares determine whether differences are likely to be significantly different, 
if the spread between expected values and actual values are too great to be achieved by chance 
alone. With respect to question #7, all of the chemistry and mathematics instructors responded 
that grade 12 mathematics proficiency was necessary, while only two of 11 english instructors 
did likewise. 


-120- 


Question #8: The 37.7% affirmative response rate, suggests the majority of post-secondary 
instructors do not perceive grade 12 proficiency in chemistry as a necessary prerequisite for their 
post-secondary subject areas. The Chi-squares indicated that these results were not limited by 
gender or age. Instructor’s responses, however, were influenced by the subject matter taught, 
_p<.000001. Chi-squares determine whether differences are likely to be significantly different, 
if the spread between expected values and actual values are too great to be achieved by chance 
alone. With respect to question #8, all 21 of the chemistry instructors responded that grade 12 
chemistry proficiency was necessary, while only one of the 11 english instructors, one of 9 
mathematics instructors, and six of 36 “other” (subject area) instructors responded in the 
affirmative 


Question #9: On a scale of one to 3, with one representing “improved”, two representing “stayed 
about the same”, and three representing “declined”, the overall mean of 2.432 and mode of 3.0, 
strongly suggest that post-secondary instructors perceive a decline in the mathematics skill sets 
and competencies of incoming freshmen. 

Question #10: On a scale of one to 3, with one representing “improved”, two representing 
“stayed about the same”, and three representing “declined”, the overall mean of 2.630 and mode 
of 3.0, strongly suggest that post-secondary instructors perceive a decline in the english skill sets 
and competencies of incoming freshmen. 


Question #11: Ona scale of one to 3, with one representing “improved”, two representing 
“stayed about the same”, and three representing “declined”, the overall mean of 2.50 and mode 
of 2.5, strongly suggest that post-secondary instructors perceive a decline in the chemistry skill 
sets and competencies of incoming freshmen. 


High School English 30 Teachers 


Questions #1 and #2: A much larger than desirable portion of the sample either was not teaching 
in the baseline year of 1989/90, or had less than 10 years in succession of teaching English 30. 
This, in the opinion of this researcher, seriously impairs the ability of these respondents to make 
an informed opinion about a performance trend in English 30. 


Question #3-7 were marked on a 4 point scale, with one representing “strongly disagree”, two 
“disagree”, three “agree”, and four “strongly agree.”: 


In each of these questions, both the mean and the mode suggest that the majority of English 30 
teachers believe: 1) the English 30 curriculum provides students with the knowledge and skills to 
continue in related post-secondary studies 2) the English 30 diploma examination adequately 
reflects the curriculum, 3) the current English 30 diploma examination is an adequate measure of 
the knowledge and skills needed to continue in post-secondary studies, 4) and the English 30 
diploma examinations are more rigorous now than they were 10 years ago, the exception is 
(disagreement: mode = 1) that the marking standards for the English 30 diploma examinations 
are more rigorous than they were 10 years ago, although, the distribution of responses is 
essentially split between agree/disagree. 


Question 8 “For English, have the skill sets and competencies of the post-secondary freshmen 
over the past 10 years, 1) improved, 2) stayed about the same, 3) declined.” Question 8 was 
marked on a three point scale with one representing “improved °, two “stayed about the same”, 
and three “declined.” 


This question was the same as question 10 on the college, university and technical instructor 
survey-which allows for comparison and contrast between the two samples. The mean and mode 
for the English 30 teachers was 1.79 and 1.00 respectively. This suggests that these respondents 
feel there has been an overall improvement in the english skill sets and competencies over the 
past 10 years. In stark contrast, the instructors who receive the freshmen into their classes 
overwhelmingly perceived a decline in english competencies and skill sets over the past 10 years 
(mean 2.63, mode 3.0). 


High School Mathematics 30 Teachers 


Questions #1 and #2: A much larger than desirable portion of the sample either was not teaching 
in the baseline year of 1989/90, or had less than 10 years in succession of teaching Mathematics 
30. This, in the opinion of this researcher, seriously impairs the ability of these respondents to 
make an informed opinion about a performance trend in Mathematics 30. 


Question #3-7 were marked on a 4 point scale, with one representing “strongly disagree”, two 
“disagree”, three “agree”, and four “strongly agree.” 


In each of these questions, both the mean and the mode suggest that the majority of Mathematics 
30 teachers believe: 1) the Mathematics 30 curriculum provides students with the knowledge and 
skills to continue in related post-secondary studies 2) the Mathematics 30 diploma examination 
adequately reflects the curriculum, 3) the current Mathematics 30 diploma examination is an 
adequate measure of the knowledge and skills needed to continue in post-secondary studies, and 
4) the Mathematics 30 diploma examinations are more rigorous now than they were 10 years 
ago, and 5)the marking standards for the Mathematics 30 diploma examinations are more 
rigorous than they were 10 years ago. 


Question 8 “For Mathematics, have the skill sets and competencies of the post-secondary 
freshmen over the past 10 years, |) improved, 2) stayed about the same, 3) declined.” Question 8 
was marked on a three point scale with one representing “improved”, two “stayed about the 
same”, and three “declined.” 


This question was the same as question 9 on the college, university and technical instructor 
survey-which allows for comparison and contrast between the two samples. The mean and mode 
for the Mathematics 30 teachers was 2.16 and 2.00 respectively. This suggests that these 
respondents feel that the mathematics skills sets and competencies of post-secondary freshman 
have remained somewhat constant over the past 10 years. In contrast, the instructors who 
receive the freshmen into their classes overwhelmingly perceived a decline in english 
competencies and skill sets over the past 10 years (mean 2.43, mode 3.0). 


-122- 


High School Chemistry 30 Teachers 


Questions #1 and #2: A much larger than desirable portion of the sample either was not teaching 
in the baseline year of 1989/90, or had less than 10 years in succession of teaching Chemistry 30. 
This, in the opinion of this researcher, seriously impairs the ability of these respondents to make 

an informed opinion about a performance trend in Chemistry 30. 


Question #3-7 were marked on a 4 point scale, with one representing “strongly disagree”, two 
“disagree”, three “agree”, and four “strongly agree”: 


In each of these questions, both the mean and the mode suggest that the majority of Chemistry 30 
teachers believe: |) the Chemistry 30 curriculum provides students with the knowledge and skills 
to continue in related post-secondary studies 2) the Chemistry 30 diploma examination 
adequately reflects the curriculum, 3) the current Chemistry 30 diploma examination is an 
adequate measure of the knowledge and skills needed to continue in post-secondary studies, 4) 
the Chemistry 30 diploma examinations are more rigorous now than they were 10 years ago, and 
5) the marking standards for the Chemistry 30 diploma examinations are more rigorous than they 
were |0 years ago. 


Question 8 “For Chemistry, have the skill sets and competencies of the post-secondary freshmen 
over the past 10 years, 1) improved, 2) stayed about the same, 3) declined.” Question 8 was 
marked on a three point scale with one representing “improved”, two “stayed about the same”, 
and three “declined.” 


This question was the same as question |] on the college, university and technical instructor 
survey-which allows for comparison and contrast between the two samples. The mean and mode 
for the Chemistry 30 teachers was 1.47 and 1.00 respectively. This suggests that these 
respondents feel there has clearly been an overall improvement in the chemistry skill sets and 
competencies over the past 10 years. In marked contrast, the instructors who receive the 
freshmen into their classes overwhelmingly perceived a decline in english competencies and skill 
sets over the past 10 years (mean 2.50, mode 2.5). 


Conclusions and Recommendations 


This probe suggests that those who create and teach the respective programs and mark the 
English 30, Mathematics 30, and Chemistry 30 diploma examinations perceive that they are 
producing graduates with the competencies and skill sets to succeed in post-secondary studies. 
On tne other hand, the instructors who receive these same graduates, express dissatisfaction with 
the quality of knowledge possessed by the incoming freshmen. 


This discrepancy requires further investigation. This researcher recommends a larger, more 
comprehensive study be undertaken to see whether the results apply when more generalizable 


samples are used. 


To conclude, if a further study bears the same dichotomous results, some thought should be 
given to allowing the end user of the educational product/service (college, university, and 


-|23- 


technical institute instructors) to have significant, meaningful input into the structure of the 
curriculum design, and marking standards of diploma examinations of high school subjects. 


-|24- 


